arXiv:1502.04265v3 [cs.DS] 28 May 2015 


Solving A:-means on High-dimensional Big Data 


Jan-Philipp W. Kappmeier^, Daniel R. Schmidt^ and Melanie Schmidt^ 

^ Technische Universitdt Berlin, Germany, kappmeier@math.tu-berlin.de 
^Carnegie Mellon University, Pittsburgh PA, {schmidtd,mschmidl}@andrew. emu. edu 


June 1, 2015 


In recent years, there have been major efforts to develop data stream algorithms that 
process inputs in one pass over the data with little memory requirement. For the fe-means 
problem, this has led to the development of several (1 + e)-approximations (under the 
assumption that A: is a constant), but also to the design of algorithms that are extremely 
fast in practice and compute solutions of high accuracy. However, when not only the length 
of the stream is high but also the dimensionality of the input points, then current methods 
reach their limits. 

We propose two algorithms, piecy and piecy-mr that are based on the recently developed 
data stream algorithm BICO that can process high dimensional data in one pass and output 
a solution of high quality. While piecy is suited for high dimensional data with a medium 
number of points, piecy-mr is meant for high dimensional data that comes in a very long 
stream. We provide an extensive experimental study to evaluate piecy and piecy-mr that 
shows the strength of the new algorithms. 


1 Introduction 

Partitioning points into subsets (clusters) with similar properties is an intuitive, old and central 
question. Unsupervised clustering aims at finding structure in data without the aid of class labels or 
an experts opinion. It has many applications ranging from computer science applications like image 
segmentation or information retrieval to applications in other sciences like biology or physics where 
it is used on genome data and CERN experiments. For an overview on the broad subject, see for 
example the survey by Jain [13]. The k-means problem asks to cluster data such that the sum of the 
squared error is minimized. It has been studied since the fifties [17, 23] and optimizing it is likely ‘the 
most commonly used partitional clustering strategy’ [14]. It measures the quality of a partitioning of 
points from based on the squared Euclidean distance function. Each cluster in the partitioning is 
represented by a center, and the objective function is the sum of the squared distances of all points 
to their respective center. 

The popularity of the A:-means problem is underlined by the fact that the most popular algorithm 
for it, Lloyd’s algorithm, was named one of the ten most influential algorithms in the data mining 
community by the organizers of the IEEE International Conference on Data Mining (ICDM) in 2008, 
see Wu et. al. [25]. Lloyd’s algorithm [18] (independently developed by Steinhaus [23]) is a local 
search heuristic that iterates the following two steps. First, it obtains an initial solution consisting of 
k centers, e.g., by drawing k centers uniformly at random from the input. Then, the following two 
steps are alternated: Assign every point to its closest center to obtain a partitioning into k subsets, 
compute the centroid of each subset and replace the center by this centroid. Both steps can only 
decrease the cost. Assigning points to their closest center is optimal for the given centers, and for each 
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subset, the centroid is the optimal center. Thus, the new solution is either cheaper or of equal cost. 
In the latter case, the algorithm has converged^. 

The quality in terms of the sum of squared errors of the output of Lloyd’s algorithm depends on 
the local optimum that is reached. Finding a good local optimum can be achieved by initializing the 
algorithm with a good initial solution. Arthur and Vassilvitskii [3] propose the k-means++ method 
as an improved version of Lloyd’s algorithm. It chooses the initial solution randomly, but only the 
first center is chosen uniformly at random. The ith center is chosen by computing all points squared 
distances to their closest center and then chosing each point with a probability proportional to its 
cost as the next center. This way, it is likely that most optimal centers have a close center in the 
start solution. This initialization method produces centers which are an C)(log A;)-approximation in 
expectation, and experiments indicate that the local optimum found from this start solution is usually 
of high quality. 

The A:-means++ method therefore provides a great tool for solving the fc-means problem in practice, 
with an (expected) worst-case guarantee, a very good practical performance and the advantage that it 
is very easy to implement. The theoretically best approximation algorithms for the /c-means problem 
provide a constant factor approximation for the general case [15, 16] and a (l + e)-approximation (even 
in linear time) if k and e are assumed to be constants [8]. 

For big data, running Lloyd’s algorithm or A;-means+-|- is less viable. Asymptotically, the running 
time of both algorithms is 0{ndk) if the number of iterations is bounded to a constant. This looks 
convincing since a straightforward implementation of finding the closest center for a point takes Q{dk) 
time, so even evaluating a solution then has running time &{ndk). Additionally, the input size is 
already 0{nd), so the running time is linear for constant values of k. However, both algorithms need 
random access to the data and iterate over it several times. As soon as the data does not fit into 
main memory, the algorithms do thus not scale very well. For example, A;-means+-|- needed over seven 
hours to compute 50 centers for a 54-dimensional data set (Covertype) with half a million points [Ij. 

A natural strategy to cope with this problem is to summarize the data before running the respective 
algorithm. A famous example for this is BIRCH [26], a SIGMOD Test of Time Award winning 
algorithm that computes a summary by one pass over the input data and then clusters the points in 
the summary. BIRCH is very fast and thus enables the processing of large data sets. However, the 
quality in terms of the sum of squared errors can be low [1, 10]. 

A more recent development is the design of fast data stream algorithms that are based on coresets. 
A coreset S' of a point set P is a weighted summary of P that maintains a strong quality guarantee: 
For any choice C of k centers, the A:-means costs of the clustering induced by C on S are within an 
(1 -|-e)-factor of the fe-means clustering that C induces on P. Thus, executing any fe-means algorithm 
on the coreset gives a good approximation of what the same algorithm would have produced on P. 
Coreset constructions are generally designed with a focus on strong theoretic bounds, but can be made 
viable in practice with slight heuristic changes. 

StreamKM-|—|- is such an algorithm [1]. It computes a coreset in one pass over the data and then 
runs /c-means-|—|- on the coreset. The size of the coreset is polylogarithmic in the input sizes if the 
dimension of the data is constant. The total memory requirement is also polylogarithmic. Experiments 
show that the quality of the solutions is comparable to the fc-meanssolutions (on the full data set) 
while the running time is a small fraction. For example, the above mentioned covertype is processed 
in ten minutes instead of seven hours, with a result of similar quality. 

BICO is a recent algorithm that outperforms StreamKM-|—|- on all data sets that are tested in [1, 10] 
and enables the processing of data sets with millions of points in less than an hour^. The above 
mentioned test case needs 27 seconds instead of ten minutes for StreamKM-|—I- and seven hours for 
/c-means-l—|-, and larger instances show even higher acceleration. BICO is also based on a coreset 
construction, using a slight variation of an algorithm with a strong theoretical guarantee. The quality 

^Since there are finitely many partitionings, the algorithm eventually converges to a local optimum. It is also common 
to stop the algorithm after a predefined number of iterations, or when the decrease of the cost function is small. 

■^One data set is BigCross, containing three million points in 68 dimensions and is processed in under twenty minutes 
for k < 250. 
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of the computed solutions in experiments is as good as that of StreamKM++. The source code of 
BICO is written in C++ and is available online. 

For data sets with up to around 100 dimensions, this is a pleasant state of affair. However, both 
the analysis of the running time and memory requirement of StreamKM++ and BICO assume that 
the dimension is a constant. At least for BICO, this is not a theoretically imposed restriction, but 
does indeed correspond to an unfavorable dependency on the dimension. The reason is that BICO 
covers the input data by spheres (in order to summarize all points in the same sphere by one point). 
When the number of spheres is too large, a rebuilding step reduces it by merging certain spheres. 
Covering a set by spheres gets increasingly difficult as the dimension gets higher, which results in 
several rebuilding steps of BICO, and in a higher running time. 

On the theoretical level, however, there are several results saying that it is possible to com¬ 
pute a coreset of a point set in one pass and with low memory requirements. For example, Feld¬ 
man and Langberg [8] propose a one-pass algorithm that computes a coreset with storage size of 
O [kd log'^ ns~^ log I/e). It is thus theoretically possible to compute coresets which scale well with the 
dimension, but there is no practical algorithm yet that achieves a high quality summary and can cope 
with very high dimensional, large data sets. 

1.1 Our Contribution 

We develop two new algorithms, piecy and piecy-mr that can deal with high-dimensional big data. For 
that, we combine BICO with a dimensionality reduction. This reduction is done by projecting onto 
the best fit subspace (of a parameterized dimension) which can be computed by the singular value 
decomposition (SVD). This is theoretically supported by recent results [5, 9] that say that projecting 
onto the best fit subspace of dimension \k/e\ and then solving the A:-means problem gives a (I + e)- 
approximation guarantee. We find that 3fe/2 dimensions are often sufficient to give highly accurate 
results. This might be due to the spectrum of the data we used. 

The next challenge is to intertwine the dimensionality reduction with the coreset computation in 
order to do both in one pass over the data. The first algorithm, piecy, reads chunks (pieces) of the 
data and processes, reduces the dimensionality of each chunk and feeds the resulting points into BICO. 
The drawback of this approach is that the total dimensionality of the complete point set that is fed 
into BICO increases with the number of pieces. For large data sets and high input dimension, this 
approach will eventually run into the same trouble as BICO (but only for data sets that are larger and 
higher dimensional than those BICO can process). In piecy-mr, we resolve this potential limitation 
by adapting a technique called Merge-and-Reduce [12]. It is a method that shows that any coreset 
computation can be turned into a one-pass algorithm at the cost of additional polylogarithmic factors. 
We adapt it to take advantage of the fact that we use a coreset computation (BICO) which already is 
a one-pass algorithm. 

As intermediate steps of our work, we evaluate two implementations for the singular value decompo¬ 
sition, an implementation in Lapack++ [24] and the implementation called redSVD [21]. We compare 
their speed and quality. Furthermore, we extend the algorithm BICO to process weighted inputs 
(which is necessary for our piecy-mr approach). 

2 The algorithms 

In the following, we describe the three algorithms that we tested: BICO and our two new algorithms, 
piecy and piecy-mr. For a point set P, we denote the centroid of P by fJ.{P) := YlxeP^/\^\- 

2.1 BICO 

BICO uses a data structure based on clustering features. A clustering feature of a point set S consists 
of the number of points IS"!, the sum of the points squared length of the 
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points XlxeS well-known formula 

^ ||x-c|p = |P| • \\fj,(P) - cW"^+ ||x-/i(P)|p, 

X^P X^P 

which holds for every point set P, a clustering feature is enough to exactly compute the cost between 
a point set and one center. BICO uses spheres that cover the input data. The point set inside each 
sphere is represented by a clustering feature. When a point arrives, it can be added to a clustering 
feature in constant time. The challenge for BICO is to decide into which clustering feature a point shall 
be added in order to equally distribute the error and to keep the overall error small. This is achieved 
by managing the clustering features in a well organized tree. Finding an appropriate clustering feature 
to add a point dominates the insertion time of a point. It lies between 0(1) and 0(m) for each point, 
where m is the coreset size. BICO includes several heuristics to speed up the identification process of 
a good clustering feature such that the running time is often closer to 0(1) per point. How well these 
heuristics work depends on the dimension of the input point set. 

Whenever the number of spheres (and thus clustering features) exceeds m, BIC0 performs a re¬ 
building step that merges some of the spheres and their features together. For high-dimensional data 
sets, this may occur more often unless the spheres become large enough. More rebuilding steps imply 
a higher running time. 

2.2 Piecy 

Our aim is to compute coresets for large high-dimensional data sets by using BICO and dimensionality 
reduction techniques, but in only one pass over the data. Piecy pursues the idea of running only a 
single instantiation of BICO and subsequently feeding it with chunks of low dimensional points. Thus, 
piecy reads a piece of p points, reduces its intrinsic dimension and inputs the resulting points into 
BICO. 

Choice of dimensionality reduction technique and number of dimensions. We use the projection 
to the best fit subspace of dimension i, where £ is a parameter to be optimized. The best fit subspace 
can be computed by using the singular value decomposition. The theoretical background of this 
approach is that projecting to best fit subspaces yields a good approximation of the squared pairwise 
distances [5, 6]. When projecting to k dimensions, a 2-approximation is guaranteed, while projecting 
to \k/e\ guarantees a (1 -|-e)-approximation. Thus, we test values between k and moderate multiples 
of k to get a reasonable compromise between approximation factor and running time. 

Using SVD to project to the best fit subspace. When we say that we use ‘the’ SVD, we mean the 
SVD of the matrix A G where the input points are stored in the columns. The SVD of A has the 
form A = UDV"^ for matrices U G G V G where U and V are unitary matrices 

and D is a diagonal matrix. The matrix V contains the right singular vectors of A. The projection of 
(the points stored in) A to the best fit subspace of dimension i is the matrix Ae = UD^V'^, where Di 
is obtained by replacing all but the first i diagonal elements by zero. Notice that the resulting matrix 
still contains d-dimensional points, but their intrinsic dimension is reduced to L This still helps, since 
the ^-dimensional point set is easier to cover for BICO. 

Computation of the SVD. Numerically stable computation of the singular value decomposition is a 
research field of its own. Basic methods that compute the full SVD, e.g. U, V and D, have a running 
time of D (ndmin (n, d)). This full SVD can be used by dropping the appropriate entries of D to 
obtain a matrix and evaluating the matrix product UDiV^ to obtain the projection onto the best 
fit subspace of dimension L However, a variety of more efficient algorithms have been developed for this 
specific task, which are known as algorithms for the truncated SVD that computes a decomposition 
Af^ = UiDiVg directly without computing the full SVD of A. Additionally, random variations are 
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known that reduce the running time sufficiently at the cost of a small error. Mahoney [20] gives a 
very nice overview on different methods to compute the singular value decomposition, then continuing 
with a detailed view on randomized methods and also discussing practical aspects. For this work, we 
use an implementation that is based on the randomized algorithm presented in [11] that multiplies A 
with a randomly drawn matrix to reduce the number of its columns before computing the SVD. The 
implementation is called redSVD [21]. In addition to reducing the number of columns, it also reduces 
the number of rows before computing the SVD. Below, we experimentally compare the performance 
of redSVD to the performance of the lapack++ implementation of the full SVD computation. 

Parameters. The authors of BICO propose using a coreset size of 200/c for BICO, which we adopt. 
That given, there are two parameters to be chosen: The size of the pieces that are the input for one 
SVD, and the number of dimensions we project to. As we argued above, the latter should be at least 
k and not more than a reasonable multiple of k. 

Memory requirement. At each point in time, we store at most one piece of the input, one SVD 
object and one BICO object. The memory requirement of BICO is proportional to the output size, 
i.e., to 200A:. 

Obtaining a solution. Running piecy computes a summary of the input points. In order to obtain 
an actuall solution for the /c-means problem, we run A:-means++[3] on the summary. 

2.3 Piecy-MR 

Notice that each chunk of data that is processed by piecy adds (in the worst case) m dimensions to 
the intrinsic dimension of the point set that is stored by the BICO instance, as long as the maximum 
dimension is reached. For large data sets, this is unfavorable. 

Helpful coreset properties. A convenient property of coresets helps here. Assume that and S 2 are 
coresets for points sets Pi and P 2 , i.e., their weighted cost approximates the weighted cost of Pi or P 2 , 
respectively, for any possible solution, and up to an e-fraction. Then the weighted cost of their union 
Si U S2 approximates the cost of Pi U P 2 for any solution up to an e-fraction as well. Furthermore, if 
we use a coreset construction to reduce Si U ^2 to a smaller set (since jS"! U 52] will be larger than the 
size of one coreset), then we obtain a coreset for Pi U P 2 . The error gets larger but is bounded by a 
(3e)-fraction of the cost of Pi U P2 (which can be compensated by choosing a smaller e to begin with). 

The Merge-and-Reduce technique. Assume for a moment that our aim is solely to compute a coreset 
with no thoughts about the intrinsic dimension of the points, but given a coreset computation that 
needs random access to the data. Then an intuitive approach is to read chunks of the data, computing 
a coreset for each chunk and joining it with previous corsets, until the union becomes too large. Then 
we could reduce the union by another coreset construction. The problem with this approach is that 
the first chunk of the data will participate in all following reduce steps, making the error unnecessary 
high. The Merge-and-Reduce technique [4] (for clustering for example used in [2, 12]) organizes the 
merge and reduce steps in a binary tree such that each point takes part in at most O(logn) reduce 
steps for a stream of n points. 

Our computation tree. We have a different problem since the coreset construction that we use, 
BICO, does not require random access to the data. Instead, we wish to keep the dimension of the 
input data small. Assume we would consider this problem independently from the coreset computation, 
by just computing the SVD of chunks of the data and keeping the reduced points in memory (maybe 
performing a second pass over the data to compute the coreset). This is infeasible since the number 
of points is not reduced and hence we would store the complete data set (with a lower intrinsic 
dimension). Imagine even that at each point in time, an oracle could provide us with the best fit 
subspace of dimension i of all points seen so far. We could still not easily use this information since 
the best fit subspace would change over time. So if we use one instance of BICO, and input each 
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Figure 1: The Merge-and-Reduce style tree built by piecy-mr with an exemplary piece size of 5 and a 
number of pieces of 3. Every chunk with piece size many points is first fed into a singnlar 
value decomposition. The resnlt of the SVD contains the same nnmber of points but has a 
smaller intrinsic dimension. It is then fed into an instantiation of the BICO algorithm. After 
number of pieces many chunks, the BICO algorithm computes a coreset of size piece size. 
Thus, we can continue on the next layer. On each layer, the number of points is reduced by 
a factor of number of pieces. We continue to call the SVD on each layer to keep the intrinsic 
dimension of the point set small. 


point into it, projected to the best fit snbspace of all points seen so far, then we wonld still get a high 
intrinsic dimension for the points stored in BICO. 

By also embedding BICO into the Merge-and-Reduce tree, we solve these problems. The first way of 
doing this would be to view the two steps of reducing the dimension and entering the points into BICO 
as one coreset computation, and just embed this into the Merge-and-Reduce technique. However, this 
has the drawback that we perform the same number of dimensionality reductions as we use BICO 
for reducing sets to smaller sets. We do, however, expect that the union of multiple dimensionality 
reduced sets will not immediately have a high intrinsic dimension. In particular if the data evolves 
over time, then multiple consecutive pieces of the input data will have approximately the same best 
fit subspace (but over time, the subspace will change). We add more flexibility to the algorithm by 
running more than one copy of BICO, while allowing that more than one SVD output is processed by 
the same BICO instance. The actual computation tree is visualized in Figure 1. 

Parameters. The algorithm has three parameters, the dimension that the SVD reduces to, the piece 
size which is the number of points that are read as input for one SVD computation, and the number of 
pieces, which is the number of SVD outputs that are processed by one instance of BICO. When BICO 
reaches the limit, the computed coreset is given to a SVD instance and then entered into a BICO on 
a higher level. It is convenient to set the piece size to 200fe, which also means that BICO compntes a 
summary of size 200A:, the summary size suggested in the original BICO publication. 

Memory requirement. We store one BICO element for each level of the computation tree. The 
degree of the tree is equal to the number of pieces b, so we have log^n levels. At each point in time, 
there is at most one SVD object in the memory since there is always at most one SVD computation at 
the same time. If the piece size is equal to 200A:, then the memory requirement of each BICO element 
is proportional. 

2.3.1 Weighted BICO 

In the original implementation, BICO processes unweighted input points. In the piecy-mr computation 
tree, the instances of BICO on higher levels of the computation tree have to process weighted inputs 
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(since the coreset points are weighted). Thus, we extended the source code of BICO to work for 
weighted inputs. For an input point x with weight w, we have to simulate what BICO would do for w 
copies of X. The main observation is that in most routines of BICO, multiple copies of the same point 
can be treated as one. For example, finding the closest reference point that is currently in the data 
structure can be done once and the result is then valid for all copies of x. Additionally, if we decide to 
open a new clustering feature with x as the reference point, we can insert all (not yet inserted) copies 
into this clustering feature at no cost. 

What we have to adjust is the insertion process into already existing clustering features, and the 
initial values for new clustering features. Setting the correct values for a new clustering feature is 
straightforward: The new clustering feature has reference point x, its sum of points is w ■ x, the sum 
of squares is w ■ x‘^ and the number of points stored in the feature is w. When we add w copies of a 
point x to an existing clustering feature with centroid /r and s points in it, then the actual increase of 
the error due to this is 


f^n\\ T /^n|| — ^ 
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where we denote the new centroid after adding w copies of x hy Hn- We conclude that the total error 
made in the feature after inserting w points is c + where c denotes the original error 

made in the feature. 

The original BICO implementation would have inserted the w copies sequentially into the clustering 
feature until the features threshold error of T would have been surpassed. It actually uses ||x — /i|p to 
measure the additional error and thus overestimates it. When adding single points, the effect of this 
overestimation decreases with each added point such that this works well for BICO. In the weighted 
version, however, using u) • ||x — /i|p is can be off by a large margin. 

Instead, we compute how many copies w' of x can be inserted into the feature without surpassing 
the threshold: 

cH- -llx —/x|P<r w'(s\\x — iJ,\\‘^ — T + c) < sT — sc 

s + w' 


Ifs - \\x — — T + c<0, the threshold will not be reached for any w' > 0. We can thus insert all w 

copies. Otherwise, we insert 


w = mm I w, 


sT — sc 
s||x — /ip — T + c 


many copies of x. If the threshold is reached before all w copies of x are inserted, i.e., if w' < w, we 
continue recursively as in the original BICO implementation. 


2.3.2 Best fit subspace for weighted points 

The singular value decomposition of a matrix is defined in an unweighted fashion, yet we want to use 
it for reducing the dimensionality of the weighted coreset points that result from BICO runs. What we 
want to do is project the points to the best fit subspace of the point set where each point is replaced 
by several copies of itself according to its weight. Translated into the matrix notation, this means that 
we want to compute the projection of A to the best fit subspace of dimension £ of a matrix F which 
contains multiple copies of the points from A according to their (integral) weight^. 

^The weights that are computed by BICO are always integral. In fact, they sum up to the number of points BICO has 
processed. 
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Figure 2: Spectrum of two StructuredWithNoise data sets with d = 500 and d = 1000, containing 50 
clusters of 5000 points each. 


Certainly, we do not want to actually create F. Instead, we construct a matrix A' where each row 
Aj* is replaced by where Wi is the weight of the ith point. By linear algebra, we can verify 

that for each pair of left and right singular vectors u and u of F with singular value a, there exists a 
vector u' such that u' and v are a pair of left and right singular vectors of A' for the same singular 
value. The reverse direction also holds. Thus, A' and F have the same best fit subspace and we can 
compute the SVD of A' in order to obtain it. After obtaining we divide each row i by y/wi to get 
the projection of the points in A. Their weight does not change. 

Notice that we cannot replace weighted points by some multiplied version when we input the points 
into BICO since the clustering behaviour of a weighted point differs from the clustering behaviour 
of any multiple (imagine a center that lies at the weighted point, so that it has no cost ~ but any 
multiplied point would have). 

3 Experiments 

The experiments were performed in three settings. For class I, all source codes were compiled using 
gcc 4.9.1, and experiments were performed on 20 identical machines with a 3.2 GHz AMD Phenom 

TM 

II X6 1090T processor and 8 GiB RAM. For class II, all source codes were compiled with gcc 4.8.2 
and all experiments were performed on 7 identical machines with a 2.8 GHz Intel® E7400 processor 
and 8 GiB RAM. In class HI, all source codes were compiled with gcc 4.9.1 and all experiments were 
performed on one machine with a 2.6 GHz Intel® Core"*"^ i5-4210M GPU processor and 16 GiB RAM. 

Our testbed consists of the following instances. Notice that we computed the spectrum for examples 
of the data set families. This gives an additional insight on the structure of the data sets. 

Caltechl28 The Caltechl28 instance was created from the GaltechlOl image database [7] and con¬ 
sists of 128 SIFT descriptors [19], resulting in 128 dimensions and about 3.1 million points. The 
instance was used in [10] for BICO benchmarks and was provided to the authors by Grzeszick 
in a private communication. 

StructuredWithNoise The idea of the StructuredWithNoise instances is to hide £ G N random 
point sets of y G N points in M'^. To build cluster i G select x dimensions Di = 

{di,..., dx} C {1,..., d} uniformly at random. Then we build the y points for cluster i: For 
point j, choose the coordinates corresponding to Di uniformly at random from [—A, A]. Select 
the remaining coordinates, i.e.,, the noise, uniformly at random from [—d, d]. This yields an 
instance with i ■ y points of dimension d. We hx A = 10, d = 1/2. Figure 2 shows the spectrum 
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Figure 3: Spectrum of a LowerBound data set with d 


10000 and a random data set with d = 10000. 


of two StructuredWithNoise data sets. We see that the first singular values are large, followed 
by slowly decreasing values until the descent steepens again. 

LowerBound Arthur and Vassilvitskii [3] propose the following class of worst-case instances for the 
kmeans++ algorithm. Define the (affine) {k, A)-simplex as the convex combination of the k unit 
vectors ei,..., in scaled by A > 0. Now, embed such a (fc, A)-simplex © in the first k 
dimensions of Then use the remaining n dimensions of to place a {n/k, (5)-simplex Si 

in each vertex i of © such that all Si use disjoint dimensions. Arthur and Vassilvitskii [3] prove 
that the kmeans++ algorithm can achieve no better approximation ratio then D(logV) on this 
class of instances, where N is the number of input points. We use a generator by Stallmann [22] 
to generate instance of this type. We fix 5 = 100 and A = 1000. The LowerBound data sets 
have a nice structure for our experiments since the only the first singnlar values are significant 
as can be seen in the left diagram in Figure 3. Notice that we computed the first 100 singular 
values for a 10000-dimensional data set. The remaining values can only be smaller. 

Random A Random data set is created by computing random numbers from [—A, A] to form an 
n-dimensional data set with n points. We used A = 10. Notice that the expected directional 
width is not equal for all directions (the points are drawn uniformly from a cube, not from a 
sphere). The resulting spectrum is slightly decreasing (see Figure 3, right diagram). 

Since the algorithms are randomized, we repeated all experiments five times with the exception of 
the the test cases for the three largest StructuredWithNoise data sets because of computation times. 

3.1 redSVD as a replacement for the lapack++ SVD 

Replacing the exact SVD computation in our algorithm by an approximative one as outlined in 
Section 2 can only work if the approximation is fast and provides reliable results. 

Additionally, we are interested in the factor of speed that can be gained by switching to redSVD 
from a full SVD computation. 

To evaluate the redSVD performance, we use a test bed of StructuredWithNoise instances with 
varying values for y and d thus yielding instances from small to huge size. The results are depicted in 
Table 1. 

We use redSVD to replace the input A by a matrix A'^. To measure the error of redSVD, we 
compare 11A — A^| to 11A — A^l ||., where Ai is the matrix computed by the full SVD implementation 
in lapack-|--|-. The matrix obtained by projecting A to its best ht subspace of dimension ^ minimizes 
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the Frobenius norm of the difference to vl, so this is a suitable measure to evaluate the redSVD result. 
The table shows the deviation of redSVD compared to the Frobenius distance of the matrix computed 
by the full SVD. 

We performed the SVD comparison reducing the dimension to values in {100,125,150}. 

We found that the error made by redSVD is indeed very small (less than 7% in all cases) while 
computation times become significantly faster: instances with 30,000 rows in 1000 columns can still 
be solved by redsvd in about 3s while lapack++’s takes 3000s on the same instace. RedSVD was able 
to compute approximate SVDs of matrices with 500,000 rows and 500 columns in 40s. 

The limiting factor to solve larger instances is in both cases the memory limitation. The largest 
instance that we could compute full SVD on was with contains n = 30000 points in d = 1000 dimensions 
(constructed with y = 300 and k = 100). The redSVD approach uses much smaller matrices and thus 
it is possible to solve StructureWithNoise instances up to n = 500000 and d = 500. Then, however, 
it also stops working. Observe that computing the redSVD on this 500.000 x 500 matrix is still faster 
than one computation of a full SVD for instances with size n = 10000 and d = 500. 


Group 

SWN, k = 100 


Percent 

error 


Full SVD CPU 

redSVD CPU 

min 

max 

avg 

med 

min 

max 

avg 

min 

max 

avg 

y-100-d-500-svd-100 

0.52 

0.79 

0.66 

0.68 

167 

169 

168 

0 

0 

0 

y-100-d-500-svd-125 

-2.91 

-2.68 

-2.76 

-2.75 

167 

169 

168 

0 

0 

0 

y-100-d-500-svd-150 

-6.41 

-6.19 

-6.27 

-6.25 

167 

169 

168 

0 

0 

0 

y-lOO-d-lOOO-svd-100 

1.38 

1.51 

1.44 

1.45 

350 

353 

351 

0 

0 

0 

y-lOO-d-lOOO-svd-125 

-0.29 

-0.13 

-0.21 

-0.21 

350 

353 

351 

0 

0 

0 

y-lOO-d-lOOO-svd-150 

-1.96 

-1.82 

-1.89 

-1.88 

350 

353 

351 

0 

1 

1 

y-200-d-500-svd-100 

0.01 

0.16 

0.10 

0.11 

657 

663 

659 

0 

0 

0 

y-200-d-500-svd-125 

-3.35 

-3.14 

-3.24 

-3.25 

657 

663 

659 

1 

1 

1 

y-200-d-500-svd-150 

-6.77 

-6.57 

-6.66 

-6.67 

657 

663 

659 

1 

1 

1 

y-200-d-1000-svd-100 

1.00 

1.16 

1.06 

1.04 

1347 

1356 

1351 

1 

1 

1 

y-200-d-1000-svd-125 

-0.58 

-0.43 

-0.51 

-0.51 

1347 

1356 

1351 

1 

1 

1 

y-200-d-1000-svd-150 

-2.17 

-2.06 

-2.12 

-2.11 

1347 

1356 

1351 

1 

2 

2 

y-300-d-500-svd-100 

-0.17 

-0.05 

-0.13 

-0.08 

1480 

1485 

1483 

1 

1 

1 

y-300-d-500-svd-125 

-3.51 

-3.41 

-3.46 

-3.44 

1480 

1485 

1483 

1 

1 

1 

y-300-d-500-svd-150 

-6.88 

-6.78 

-6.84 

-6.81 

1480 

1485 

1483 

1 

2 

2 

y-300-d-1000-svd-100 

0.91 

0.93 

0.92 

0.93 

3028 

3039 

3034 

2 

2 

2 

y-300-d-1000-svd-125 

-0.66 

-0.63 

-0.65 

-0.63 

3028 

3039 

3034 

2 

2 

2 

y-300-d-1000-svd-150 

-2.25 

-2.21 

-2.22 

-2.21 

3028 

3039 

3034 

3 

3 

3 

y-500-d-500-svd-100 

— 

— 

— 

— 

— 

— 

— 

2 

2 

2 

y-500-d-500-svd-125 

— 

— 

— 

— 

— 

— 

— 

2 

2 

2 

y-500-d-500-svd-150 

— 

— 

— 

— 

— 

— 

— 

3 

3 

3 

y-500-d-1000-svd-100 

— 

— 

— 

— 

— 

— 

— 

3 

3 

3 

y-500-d-1000-svd-125 

— 

— 

— 

— 

— 

— 

— 

4 

4 

4 

y-500-d-1000-svd-150 

— 

— 

— 

— 

— 

— 

— 

4 

4 

4 

y-1000-d-500-svd-100 

— 

— 

— 

— 

— 

— 

— 

4 

4 

4 

y-1000-d-500-svd-125 

— 

— 

— 

— 

— 

— 

— 

5 

5 

5 

y-1000-d-500-svd-150 

— 

— 

— 

— 

— 

— 

— 

6 

6 

6 

y-lOOO-d-lOOO-svd-100 

— 

— 

— 

— 

— 

— 

— 

7 

7 

7 

y-lOOO-d-lOOO-svd-125 

— 

— 

— 

— 

— 

— 

— 

8 

8 

8 

y-lOOO-d-lOOO-svd-150 

— 

— 

— 

— 

— 

— 

— 

10 

10 

10 

y-2000-d-500-svd-100 

— 

— 

— 

— 

— 

— 

— 

9 

9 

9 

y-2000-d-500-svd-125 

— 

— 

— 

— 

— 

— 

— 

11 

11 

11 

y-2000-d-500-svd-150 

— 

— 

— 

— 

— 

— 

— 

13 

13 

13 

y-2000-d- 1000-svd-100 

— 

— 

— 

— 

— 

— 

— 

16 

16 

16 

y-2000-d- 1000-svd-125 

— 

— 

— 

— 

— 

— 

— 

20 

20 

20 

y-2000-d- 1000-svd-150 

— 

— 

— 

— 

— 

— 

— 

23 

23 

23 

y-5000-d-500-svd-100 

— 

— 

— 

— 

— 

— 

— 

24 

24 

24 

y-5000-d-500-svd-125 

— 

— 

— 

— 

— 

— 

— 

29 

29 

29 

y-5000-d-500-svd-150 

— 

— 

— 

— 

— 

— 

— 

36 

36 

36 

y-5000-d- 1000-svd-100 

— 

— 

— 

— 

— 

— 

— 

— 

— 

— 

y-5000-d- 1000-svd-125 

— 

— 

— 

— 

— 

— 

— 

— 

— 

— 

y-5000-d- 1000-svd-150 

— 

— 

— 

— 

— 

— 

— 

— 

— 

— 

y- lOOOO-d-500-svd-100 

— 

— 

— 

— 

— 

— 

— 

— 

— 

— 

y- lOOOO-d-500-svd-125 

— 

— 

— 

— 

— 

— 

— 

— 

— 

— 

y- lOOOO-d-500-svd-150 

— 

— 

— 

— 

— 

— 

— 

— 

— 

— 

y-lOOOO-d-lOOO-svd-100 

— 

— 

— 

— 

— 

— 

— 

— 

— 

— 


Table 1: Comparison of the full SVD by lapackH—h with redSVD on various randomized instances with k = 100 and varying 
parameters. The table shows error percentage of the approximate solution and the running times in seconds. Notice that 
the number of points in the instances is fc • y. Experiment belongs to class I. Given a matrix A, its full SVD and its 
approximate SVD we verify the accuracy of the redSVD approximation by comparing ||A — A^|||^ to ||A — A^jj^ on 
instances of the StructureWithNoise class. The matrix obtained by projecting A to its best fit subspace of dimension i 
minimizes the Frobenius norm of the difference to A, so this is a suitable measure to evaluate the redSVD result. 



3.2 Performance of BICO, Piecy and Piecy-MR 
BICO. 

Table 2 contains the basic test cases and reports the results that BICO achieved when run on the test 
case directly. Notice that we use the current version of the source code from the BICO website. In 
contrast to the version used in [9], this version has varying running times. This shows both in the 
BICO experiments itself as in the experiments for piecy and piecy-mr since they both use BICO. For 
example, consider the varying running time of BICO on the enron data set. Obviously, piecy and 
piecy-mr will improve when the source code of BICO is updated. For this reason, we will pay most 
attention to the median of the running times and not the average running time. 

In all tables, the parameters are listed in the caption if they are equal for all test cases in the table, 
or at the start of each line if they vary. We denote the number of points by n, the dimension by d and 
the number of centers by k. 


Group 


Cost 




Running time 



min 

max 

average 

median 

min 

max 

avg 

med 

LowerBound, experiments belong to class II 

k-lO-n-lO-^-d-lOOlO 

5.00 X lO’’ 

5.00 X 10^ 

5.00 X 10^ 

5.00 X 10’’ 

74 

77 

75.6 

76 

k-SO-n-lO-^-d-lOOSO 

4.98 X lO’’ 

14.88 X 10^ 

8.94 X 10^ 

4.98 X lO’ 

78 

79 

78.7 

79 

BagOfWords, experiments belong to class II 

enron-k-10 

1.63 X lO’’ 

1.69 X lO’’ 

1.65 X lO’’ 

1.66 X 10’ 

480 

1679 

611.9 

491 

kos-k-2 

3.90 X 10® 

3.95 X 10® 

3.92 X 10® 

3.91 X 10® 

10 

11 

10.9 

11 

Caltechl28, experiments belong to class I 

k-5 

4.23 X 10“ 

4.23 X 10“ 

4.23 X 10“ 

4.23 X 10“ 

319 

319 

319.1 

319 

k-10 

4.13 X 10“ 

4.13 X 10“ 

4.13 X 10“ 

4.13 X 10“ 

366 

366 

366.0 

366 

k-50 

3.43 X 10“ 

3.43 X 10“ 

3.43 X 10“ 

3.43 X 10“ 

428 

428 

427.6 

428 

k-100 

3.04 X 10“ 

3.04 X 10“ 

3.04 X 10“ 

3.04 X 10“ 

503 

503 

502.9 

503 

k-250 

2.74 X 10“ 

2.74 X 10“ 

2.74 X 10“ 

2.74 X 10“ 

571 

571 

571.1 

571 

k-1000 

2.34 X 10“ 

2.34 X 10“ 

2.34 X 10“ 

2.34 X 10“ 

560 

560 

559.7 

560 

Random, experiments belong to class II 

n-10®-d-1000-k-10 

3.33 X 10i° 

3.33 X lO^® 

3.33 X lOi® 

3.33 X lO’O 

1058 

2126 

1718.6 

1816 

n-10®-d-1000-k-20 

3.33 X lOio 

3.33 X lOio 

3.33 X lOi® 

3.33 X lO’O 

2578 

4792 

3522.8 

2952 

n-10®-d-1000-k-50 

3.33 X 10i° 

3.33 X lO^® 

3.33 X lOi® 

3.33 X lO’O 

1004 

4466 

2326.8 

1819 

StructuredWithNoise, experiments belong to class I 

y-5000-d-1000-k-10 

1.70 X 10® 

1.70 X 10® 

1.70 X 10® 

1.70 X 10® 

368 

1227 

610.1 

592 

y-5000-d-1000-k-20 

1.70 X 10® 

1.70 X 10® 

1.70 X 10® 

1.70 X 10® 

591 

2204 

1217.8 

1085 

y-5000-d-1000-k-50 

1.70 X 10® 

1.70 X 10® 

1.70 X 10® 

1.70 X 10® 

547 

2865 

1133.8 

872 

y-5000-d-1000-k-100 

1.69 X 10® 

1.70 X 10® 

1.70 X 10® 

1.70 X 10® 

714 

7679 

2275.1 

1359 

y-10000-d-500-k-10 

3.36 X 10® 

3.37 X 10® 

3.36 X 10® 

3.36 X 10® 

411 

1284 

740.9 

691 

y-10000-d-500-k-20 

3.35 X 10® 

3.37 X 10® 

3.36 X 10® 

3.36 X 10® 

435 

2392 

1030.0 

805 

y-10000-d-500-k-50 

3.34 X 10® 

3.37 X 10® 

3.36 X 10® 

3.36 X 10® 

576 

4772 

2295.2 

2084 

y-10000-d-500-k-100 

3.32 X 10® 

3.37 X 10® 

3.34 X 10® 

3.34 X 10® 

846 

6233 

2168.6 

1434 

y-lOOOO-d-lOOO-k-10 

3.41 X 10® 

3.41 X 10® 

3.41 X 10® 

3.41 X 10® 

722 

2669 

1454.9 

1244 

y-lOOOO-d-lOOO-k-20 

3.40 X 10® 

3.41 X 10® 

3.41 X 10® 

3.41 X 10® 

770 

4521 

2350.0 

2230 

y-lOOOO-d-lOOO-k-50 

3.40 X 10® 

3.41 X 10® 

3.40 X 10® 

3.40 X 10® 

1299 

8648 

4547.8 

4897 

y-lOOOO-d-lOOO-k-100 

3.39 X 10® 

3.41 X 10® 

3.40 X 10® 

3.40 X 10® 

1477 

8626 

3602.6 

2605 

StructuredWithNoise, experiments belong to class III 

y-1000000-d-500-k-50 

3.34 X 10® 

3.35 X 10® 

3.34 X 10® 

3.34 X 10® 

335 

4192 

1280.2 

938 


Table 2: BICO results. 


Piecy. 

For piecy, we test the influence of two parameters, the piece size, abbreviation ps, and the number of 
dimensions to which we project the points, abbreviation svd. We computed an extensive number of 
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test cases for the data set CalTechl28 to study the influence of the parameters. Table 3 summarizes 
the results for piecy. For k = 5, 10, 50, piecy is always faster than BICO. The table shows that larger 
values of svd increase the running time, which is expected, but stays below the running time of BICO 
for these test cases. The accuracy of piecy is high, in particular for larger svd values. At A: = 100, 
the situation starts to change as there are three test cases where piecy is slower than BICO. For 
k = 250,1000 the results by piecy become somewhat unpredictable. Notice that the number of centers 
is here higher than the input dimension of the points (which is 128). Thus, piecy cannot gain anything 
from projecting to a number of dimensions > k, and the SVD processing becomes overhead. It is thus 
clear that piecy does not perform as well on these test cases. 


Group 


Cost 



Running time 



min 

max 

average 

median 

min 

max 

avg 

med 

k = 5 

ps-lOOO-s-lO 

4.37 X lO^i 

4.65 X lOii 

4.52 X lOii 

4.54 X IQil 

202 

244 

217.7 

209 

ps-lOOO-s-20 

4.35 X loll 

4.46 X loll 

4.38 X loll 

4.36 X IQii 

201 

259 

228.8 

235 

ps-lOOO-s-50 

4.25 X lOii 

4.51 X lOii 

4.36 X lOii 

4.36 X IQii 

224 

256 

236.5 

234 

ps-lOOO-s-75 

4.22 X loll 

4.36 X loll 

4.28 X loll 

4.27 X IQii 

255 

285 

274.4 

279 

ps-lOOO-s-100 

4.20 X lOii 

4.66 X lOii 

4.38 X lOii 

4.40 X IQii 

294 

333 

313.8 

317 

ps-2000-s-lO 

4.39 X lOii 

4.65 X lOii 

4.51 X lOii 

4.51 X IQii 

203 

246 

220.4 

222 

ps-2000-s-20 

4.39 X loll 

4.64 X loll 

4.50 X loll 

4.45 X IQii 

211 

297 

248.8 

241 

ps-2000-s-50 

4.22 X lOii 

4.44 X lOii 

4.32 X lOii 

4.31 X IQii 

234 

289 

253.7 

249 

PS-2000-S-75 

4.20 X loll 

4.44 X loll 

4.35 X loll 

4.36 X IQii 

262 

294 

282.0 

287 

ps-2000-s-100 

4.20 X lOii 

4.44 X lOii 

4.32 X lOii 

4.32 X IQii 

280 

310 

297.4 

296 

ps-SOOO-s-lO 

4.47 X lOii 

4.71 X lOii 

4.57 X lOii 

4.59 X IQii 

213 

257 

237.6 

239 

ps-5000-s-20 

4.30 X loll 

4.44 X loll 

4.36 X loll 

4.33 X IQii 

240 

290 

256.5 

245 

PS-5000-S-50 

4.21 X lOii 

4.33 X lOii 

4.28 X lOii 

4.28 X IQii 

245 

296 

264.7 

260 

PS-5000-S-75 

4.28 X loll 

4.33 X loll 

4.31 X loll 

4.32 X IQii 

263 

319 

291.8 

290 

ps-SOOO-s-lOO 

4.25 X lOii 

4.41 X lOii 

4.32 X lOii 

4.30 X IQii 

277 

310 

292.8 

291 

ps-lOOOO-s-10 

4.46 X lOii 

4.60 X lOii 

4.54 X lOii 

4.53 X IQii 

215 

291 

241.4 

242 

ps-lOOOO-s-20 

4.36 X loll 

4.44 X loll 

4.39 X loll 

4.37 X IQii 

214 

271 

243.6 

245 

ps-lOOOO-s-50 

4.27 X lOii 

4.30 X lOii 

4.28 X lOii 

4.28 X IQii 

238 

251 

244.4 

244 

ps-lOOOO-s-75 

4.27 X loll 

4.60 X loll 

4.42 X loll 

4.38 X IQii 

262 

283 

272.4 

267 

ps-lOOOO-s-100 

4.28 X lOii 

4.49 X lOii 

4.40 X lOii 

4.41 X IQii 

268 

291 

280.0 

281 

fc = 10 

ps-lOOO-s-10 

4.13 X lOii 

4.22 X lOii 

4.18 X lOii 

4.20 X IQil 

225 

280 

248.5 

252 

ps-lOOO-s-20 

3.99 X loll 

4.13 X lOii 

4.06 X lOii 

4.07 X IQii 

266 

390 

312.6 

303 

ps-lOOO-s-50 

3.94 X loll 

4.10 X loll 

4.00 X loll 

3.99 X IQii 

242 

319 

271.0 

263 

ps-lOOO-s-75 

3.89 X loll 

4.07 X lOii 

3.97 X loll 

3.97 X IQii 

280 

367 

321.3 

336 

ps-lOOO-s-100 

3.95 X loll 

4.06 X loll 

3.98 X loll 

3.95 X IQii 

309 

341 

327.2 

331 

ps-2000-s-10 

4.21 X lOii 

4.62 X lOii 

4.33 X lOii 

4.29 X IQii 

227 

339 

269.6 

264 

PS-2000-S-20 

4.01 X lOii 

4.18 X lOii 

4.07 X lOii 

4.06 X IQii 

238 

268 

253.1 

250 

ps-2000-s-50 

3.93 X loll 

3.99 X loll 

3.97 X loll 

3.97 X IQii 

252 

282 

268.1 

268 

PS-2000-S-75 

3.89 X loll 

4.07 X lOii 

3.94 X lOii 

3.90 X IQii 

267 

363 

312.6 

305 

ps-2000-s-100 

3.94 X loll 

4.05 X loll 

3.99 X loll 

4.00 X IQii 

318 

384 

350.5 

359 

ps-SOOO-s-lO 

4.23 X lOii 

4.30 X lOii 

4.28 X lOii 

4.29 X IQii 

241 

364 

303.4 

285 

ps-5000-s-20 

4.03 X lOii 

4.23 X lOii 

4.12 X lOii 

4.14 X IQii 

233 

272 

256.3 

264 

PS-5000-S-50 

3.94 X loll 

4.12 X loll 

4.03 X loll 

4.04 X IQil 

245 

347 

283.6 

287 

PS-5000-S-75 

3.89 X loll 

4.01 X lOii 

3.97 X loll 

3.97 X IQii 

266 

318 

282.2 

271 

ps-SOOO-s-lOO 

3.87 X loll 

4.14 X loll 

3.99 X loll 

3.98 X IQii 

298 

324 

308.9 

308 

ps-lOOOO-s-10 

4.24 X lOii 

4.29 X lOii 

4.27 X lOii 

4.27 X IQii 

260 

295 

277.2 

281 

ps-lOOOO-s-20 

4.05 X lOii 

4.27 X lOii 

4.15 X lOii 

4.12 X IQii 

239 

263 

253.6 

254 

ps-lOOOO-s-50 

3.88 X loll 

4.07 X loll 

3.99 X loll 

4.00 X IQii 

253 

311 

282.1 

277 

ps-lOOOO-s-75 

4.01 X lOii 

4.07 X lOii 

4.04 X lOii 

4.04 X IQii 

277 

310 

293.8 

289 

ps-lOOOO-s-100 

3.92 X loll 

4.03 X loll 

3.97 X loll 

3.96 X IQii 

308 

367 

326.9 

313 

fc = 50 

ps-lOOO-s-10 

3.66 X loll 

3.74 X loll 

3.69 X loll 

3.70 X IQii 

333 

442 

381.5 

367 

ps-lOOO-s-20 

3.43 X lOii 

3.56 X loll 

3.51 X lOii 

3.51 X IQii 

269 

376 

334.7 

328 

ps-lOOO-s-50 

3.30 X loll 

3.34 X lOii 

3.32 X lOii 

3.33 X IQii 

306 

415 

355.6 

341 

ps-lOOO-s-75 

3.25 X lOii 

3.37 X loll 

3.30 X loll 

3.31 X IQii 

351 

470 

395.3 

385 

ps-lOOO-s-100 

3.24 X lOii 

3.33 X loll 

3.29 X lOii 

3.30 X IQii 

377 

412 

397.8 

410 

ps-2000-s-10 

3.73 X loll 

3.80 X loll 

3.76 X loll 

3.77 X IQii 

312 

480 

364.6 

350 

PS-2000-S-20 

3.46 X lOii 

3.55 X loll 

3.52 X lOii 

3.53 X IQii 

329 

485 

419.4 

415 

ps-2000-s-50 

3.30 X loll 

3.44 X lOii 

3.37 X loll 

3.36 X IQii 

343 

477 

399.2 

372 


Table 3: Piecy on Caltechl28, experiments belong to class 1. 
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min 

max 

average 

median 

min 

max 

avg 

med 

PS-2000-S-75 

3.27 X 10“ 

3.31 X 10“ 

3.28 X 10“ 

3.28 X 10“ 

310 

505 

387.2 

343 

PS-2000-S-100 

3.28 X 10“ 

3.34 X 10“ 

3.31 X 10“ 

3.31 X 10“ 

344 

423 

370.9 

357 

ps-SOOO-s-lO 

3.78 X 10“ 

3.83 X 10“ 

3.80 X 10“ 

3.80 X 10“ 

312 

414 

361.0 

361 

PS-5000-S-20 

3.53 X 10“ 

3.59 X 10“ 

3.56 X 10“ 

3.56 X 10“ 

286 

444 

344.2 

308 

PS-5000-S-50 

3.29 X 10“ 

3.39 X 10“ 

3.34 X 10“ 

3.36 X 10“ 

310 

427 

357.0 

341 

PS-5000-S-75 

3.25 X 10“ 

3.41 X 10“ 

3.34 X 10“ 

3.35 X 10“ 

326 

401 

355.2 

348 

ps-SOOO-s-lOO 

3.27 X 10“ 

3.35 X 10“ 

3.30 X 10“ 

3.28 X 10“ 

324 

505 

411.8 

391 

ps-lOOOO-s-10 

3.81 X 10“ 

3.90 X 10“ 

3.85 X 10“ 

3.85 X 10“ 

283 

471 

337.1 

313 

ps-lOOOO-s-20 

3.56 X 10“ 

3.65 X 10“ 

3.60 X 10“ 

3.60 X 10“ 

321 

396 

349.8 

348 

ps-lOOOO-s-50 

3.30 X 10“ 

3.42 X 10“ 

3.35 X 10“ 

3.36 X 10“ 

313 

384 

356.6 

362 

ps-lOOOO-s-75 

3.28 X 10“ 

3.39 X 10“ 

3.32 X 10“ 

3.29 X 10“ 

327 

452 

377.7 

363 

ps-lOOOO-s-100 

3.25 X 10“ 

3.38 X 10“ 

3.33 X 10“ 

3.35 X 10“ 

348 

479 

399.5 

381 


k = 100 


ps-lOOO-s-10 

3.52 

X 

10“ 

3.57 

X 

10“ 

3.54 

X 

10“ 

3.53 

X 

10“ 

399 

629 

552.8 

572 

ps-lOOO-s-20 

3.24 

X 

10“ 

3.30 

X 

10“ 

3.27 

X 

10“ 

3.28 

X 

10“ 

373 

625 

454.3 

398 

ps-lOOO-s-50 

3.02 

X 

10“ 

3.13 

X 

10“ 

3.07 

X 

10“ 

3.06 

X 

10“ 

420 

608 

503.1 

471 

ps-lOOO-s-75 

3.04 

X 

10“ 

3.08 

X 

10“ 

3.05 

X 

10“ 

3.05 

X 

10“ 

366 

481 

429.8 

424 

ps-lOOO-s-100 

3.00 

X 

10“ 

3.09 

X 

10“ 

3.04 

X 

10“ 

3.06 

X 

10“ 

464 

795 

557.4 

484 

ps-2000-s-lO 

3.54 

X 

10“ 

3.59 

X 

10“ 

3.57 

X 

10“ 

3.57 

X 

10“ 

381 

746 

497.3 

444 

PS-2000-S-20 

3.28 

X 

10“ 

3.35 

X 

10“ 

3.32 

X 

10“ 

3.32 

X 

10“ 

438 

527 

487.9 

490 

PS-2000-S-50 

3.06 

X 

10“ 

3.16 

X 

10“ 

3.10 

X 

10“ 

3.09 

X 

10“ 

341 

475 

412.2 

399 

PS-2000-S-75 

3.05 

X 

10“ 

3.11 

X 

10“ 

3.08 

X 

10“ 

3.08 

X 

10“ 

382 

515 

422.2 

393 

PS-2000-S-100 

2.98 

X 

10“ 

3.08 

X 

10“ 

3.04 

X 

10“ 

3.04 

X 

10“ 

492 

657 

587.2 

595 

PS-5000-S-10 

3.63 

X 

10“ 

3.68 

X 

10“ 

3.66 

X 

10“ 

3.67 

X 

10“ 

368 

494 

416.7 

404 

PS-5000-S-20 

3.31 

X 

10“ 

3.39 

X 

10“ 

3.34 

X 

10“ 

3.33 

X 

10“ 

448 

723 

550.0 

528 

PS-5000-S-50 

3.03 

X 

10“ 

3.16 

X 

10“ 

3.08 

X 

10“ 

3.09 

X 

10“ 

351 

602 

471.5 

493 

PS-5000-S-75 

3.05 

X 

10“ 

3.14 

X 

10“ 

3.08 

X 

10“ 

3.07 

X 

10“ 

357 

812 

487.9 

421 

PS-5000-S-100 

3.00 

X 

10“ 

3.10 

X 

10“ 

3.05 

X 

10“ 

3.02 

X 

10“ 

458 

610 

516.2 

511 

ps-lOOOO-s-10 

3.65 

X 

10“ 

3.71 

X 

10“ 

3.67 

X 

10“ 

3.66 

X 

10“ 

306 

441 

382.7 

378 

ps-lOOOO-s-20 

3.35 

X 

10“ 

3.40 

X 

10“ 

3.37 

X 

10“ 

3.37 

X 

10“ 

341 

534 

449.3 

468 

ps-lOOOO-s-50 

3.08 

X 

10“ 

3.17 

X 

10“ 

3.11 

X 

10“ 

3.09 

X 

10“ 

426 

652 

508.4 

467 

ps-lOOOO-s-75 

3.03 

X 

10“ 

3.16 

X 

10“ 

3.08 

X 

10“ 

3.07 

X 

10“ 

358 

610 

474.3 

441 

ps-lOOOO-s-100 

3.00 

X 

10“ 

3.12 

X 

10“ 

3.05 

X 

10“ 

3.04 

X 

10“ 

390 

492 

465.2 

482 


k = 250 


ps-lOOO-s-10 

3.27 

X 

10“ 

3.29 

X 

10“ 

3.28 

X 

10“ 

3.29 

X 

10“ 

466 

976 

817.2 

939 

ps-lOOO-s-20 

2.97 

X 

10“ 

3.04 

X 

10“ 

2.99 

X 

10“ 

2.98 

X 

10“ 

441 

883 

681.4 

667 

ps-lOOO-s-50 

2.77 

X 

10“ 

2.82 

X 

10“ 

2.79 

X 

10“ 

2.79 

X 

10“ 

437 

736 

599.8 

650 

ps-lOOO-s-75 

2.73 

X 

10“ 

2.79 

X 

10“ 

2.75 

X 

10“ 

2.75 

X 

10“ 

440 

767 

633.9 

608 

ps-lOOO-s-100 

2.68 

X 

10“ 

2.80 

X 

10“ 

2.74 

X 

10“ 

2.73 

X 

10“ 

468 

816 

686.6 

692 

ps-2000-s-lO 

3.33 

X 

10“ 

3.38 

X 

10“ 

3.36 

X 

10“ 

3.36 

X 

10“ 

414 

1226 

759.5 

843 

PS-2000-S-20 

3.03 

X 

10“ 

3.09 

X 

10“ 

3.07 

X 

10“ 

3.09 

X 

10“ 

435 

815 

601.0 

581 

PS-2000-S-50 

2.78 

X 

10“ 

2.87 

X 

10“ 

2.82 

X 

10“ 

2.82 

X 

10“ 

472 

672 

539.1 

531 

PS-2000-S-75 

2.72 

X 

10“ 

2.83 

X 

10“ 

2.78 

X 

10“ 

2.79 

X 

10“ 

386 

729 

540.2 

447 

PS-2000-S-100 

2.72 

X 

10“ 

2.82 

X 

10“ 

2.78 

X 

10“ 

2.78 

X 

10“ 

489 

642 

558.4 

513 

ps-SOOO-s-lO 

3.41 

X 

10“ 

3.47 

X 

10“ 

3.45 

X 

10“ 

3.45 

X 

10“ 

330 

713 

509.0 

471 

PS-5000-S-20 

3.08 

X 

10“ 

3.17 

X 

10“ 

3.12 

X 

10“ 

3.12 

X 

10“ 

461 

868 

686.7 

791 

PS-5000-S-50 

2.83 

X 

10“ 

2.86 

X 

10“ 

2.84 

X 

10“ 

2.84 

X 

10“ 

435 

726 

594.9 

663 

PS-5000-S-75 

2.72 

X 

10“ 

2.81 

X 

10“ 

2.75 

X 

10“ 

2.74 

X 

10“ 

437 

762 

658.0 

732 

PS-5000-S-100 

2.73 

X 

10“ 

2.77 

X 

10“ 

2.75 

X 

10“ 

2.76 

X 

10“ 

445 

775 

646.8 

673 

ps-lOOOO-s-10 

3.45 

X 

10“ 

3.50 

X 

10“ 

3.48 

X 

10“ 

3.47 

X 

10“ 

314 

619 

485.4 

469 

ps-lOOOO-s-20 

3.12 

X 

10“ 

3.19 

X 

10“ 

3.14 

X 

10“ 

3.13 

X 

10“ 

462 

829 

690.7 

703 

ps-lOOOO-s-50 

2.81 

X 

10“ 

2.83 

X 

10“ 

2.82 

X 

10“ 

2.83 

X 

10“ 

450 

823 

589.3 

589 

ps-lOOOO-s-75 

2.74 

X 

10“ 

2.86 

X 

10“ 

2.79 

X 

10“ 

2.81 

X 

10“ 

417 

742 

556.9 

485 

ps-lOOOO-s-100 

2.71 

X 

10“ 

2.82 

X 

10“ 

2.76 

X 

10“ 

2.77 

X 

10“ 

476 

823 

671.3 

727 


k = 1000 


ps-lOOO-s-10 

2.94 X 10“ 

2.97 X 10“ 

2.96 X 10“ 

2.96 X 10“ 

379 

9746 

3319.7 

478 

ps-lOOO-s-20 

2.62 X 10“ 

2.68 X 10“ 

2.65 X 10“ 

2.66 X 10“ 

428 

6384 

3026.1 

3668 

ps-lOOO-s-50 

2.35 X 10“ 

2.43 X 10“ 

2.39 X 10“ 

2.40 X 10“ 

883 

4566 

2324.2 

1801 

ps-lOOO-s-75 

2.32 X 10“ 

2.38 X 10“ 

2.35 X 10“ 

2.35 X 10“ 

586 

4788 

1601.1 

859 

ps-lOOO-s-100 

2.32 X 10“ 

2.33 X 10“ 

2.33 X 10“ 

2.33 X 10“ 

1076 

3793 

2248.7 

3417 

ps-2000-s-lO 

3.03 X 10“ 

3.09 X 10“ 

3.06 X 10“ 

3.05 X 10“ 

317 

6692 

3446.5 

3786 

PS-2000-S-20 

2.71 X 10“ 

2.77 X 10“ 

2.74 X 10“ 

2.73 X 10“ 

459 

5175 

2885.6 

3583 

PS-2000-S-50 

2.39 X 10“ 

2.45 X 10“ 

2.42 X 10“ 

2.42 X 10“ 

573 

5388 

2791.4 

2773 

PS-2000-S-75 

2.32 X 10“ 

2.37 X 10“ 

2.35 X 10“ 

2.36 X 10“ 

653 

2866 

1701.0 

1947 

PS-2000-S-100 

2.29 X 10“ 

2.38 X 10“ 

2.33 X 10“ 

2.33 X 10“ 

585 

5912 

1748.2 

706 

PS-5000-S-10 

3.17 X 10“ 

3.18 X 10“ 

3.17 X 10“ 

3.18 X 10“ 

4664 

7886 

6440.7 

6910 


Table 3: Piecy on Caltechl28, experiments belong to class I. 
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min 

max 

average 

median 

min 

max 

avg 

med 

PS-5000-S-20 

2.78 X 10“ 

2.85 X 10“ 

2.81 X 10“ 

2.81 X 10“ 

536 

5521 

3058.6 

3454 

PS-5000-S-50 

2.43 X 10“ 

2.46 X 10“ 

2.44 X 10“ 

2.44 X 10“ 

526 

2366 

954.0 

590 

PS-5000-S-75 

2.35 X 10“ 

2.38 X 10“ 

2.37 X 10“ 

2.37 X 10“ 

809 

3889 

1839.0 

961 

PS-5000-S-100 

2.32 X 10“ 

2.36 X 10“ 

2.33 X 10“ 

2.33 X 10“ 

626 

6184 

3354.7 

4155 

ps-lOOOO-s-10 

3.28 X 10“ 

3.30 X 10“ 

3.29 X 10“ 

3.29 X 10“ 

262 

4210 

2197.8 

1526 

ps-lOOOO-s-20 

2.82 X 10“ 

2.87 X 10“ 

2.84 X 10“ 

2.84 X 10“ 

960 

4509 

2909.1 

3196 

ps-lOOOO-s-50 

2.44 X 10“ 

2.50 X 10“ 

2.47 X 10“ 

2.48 X 10“ 

496 

4026 

1776.6 

791 

ps-lOOOO-s-75 

2.35 X 10“ 

2.38 X 10“ 

2.37 X 10“ 

2.37 X 10“ 

598 

5633 

2186.3 

1202 

ps-lOOOO-s-100 

2.31 X 10“ 

2.34 X 10“ 

2.33 X 10“ 

2.33 X 10“ 

592 

4653 

1874.3 

864 


Table 3: Piecy on Caltechl28 (continued), experiments belong to class I. 


On the Random instance, piecy performs rather badly. The instance is large (one million points with 
1000 dimensions, i.e., a total of 10® input numbers). In this case, most of the advantage due to the 
dimensionality reduction is lost because too many pieces are processed and contribute to the intrinsic 
dimension of the point set that is given to BICO. A similar behavior can be observed for the three 
largest StructuredWithNoise data sets. In particular when n reaches a million points, piecys running 
time goes up. 

On the smaller LowerBound test cases though, piecy again outperforms BICO’s running time. The 
LowerBound instances have a huge dimension of 10^ but the number of points is also bounded by 10®. 
Thus, there is less time for piecy to accumulate to many intrinsic dimensions. 


Group 


Cost 



Running time 



min 

max 

average 

median 

min 

max 

avg 

med 

o 

II 

ps-2000-svd-10 

3.33 X 10^° 

3.33 X 10^° 

3.33 X 10i“ 

3.33 X lO^o 

1013 

1464 

1283.2 

1289 

ps-2000-svd-20 

3.33 X 10^° 

3.33 X 10^° 

3.33 X 10i“ 

3.33 X lO^o 

997 

1841 

1536.3 

1666 

ps-2000-svd-50 

3.33 X IQio 

3.33 X lOio 

3.33 X 10i“ 

3.33 X lOio 

945 

2235 

1452.7 

1347 

ps-2000-svd-75 

3.33 X 10^° 

3.33 X 10^° 

3.33 X 10i“ 

3.33 X 10i° 

755 

2399 

1771.7 

1981 

ps-4000-svd-10 

3.33 X 10^° 

3.33 X 10^° 

3.33 X 10i“ 

3.33 X 10i° 

1178 

2048 

1491.4 

1340 

ps-4000-svd-20 

3.33 X 10^° 

3.33 X 10^° 

3.33 X 10i“ 

3.33 X lO^® 

1083 

2156 

1536.4 

1408 

ps-4000-svd-50 

3.33 X 10^° 

3.33 X 10^° 

3.33 X 10i“ 

3.33 X lO^® 

759 

1890 

1423.3 

1561 

ps-4000-svd-75 

3.33 X IQio 

3.33 X lOio 

3.33 X 10i“ 

3.33 X lOi® 

953 

2148 

1628.7 

1495 

ps-lOOOO-svd-10 

3.33 X 10^° 

3.33 X 10^° 

3.33 X 10i“ 

3.33 X lO^® 

1447 

2065 

1650.2 

1511 

ps-lOOOO-svd-20 

3.33 X 10^° 

3.33 X 10^° 

3.33 X 10i“ 

3.33 X lO^® 

1091 

1887 

1500.9 

1537 

ps-lOOOO-svd-50 

3.33 X 10^° 

3.33 X 10^° 

3.33 X 10i“ 

3.33 X lO^® 

1058 

2752 

1977.8 

2115 

ps-lOOOO-svd-75 

3.33 X 10^° 

3.33 X 10^° 

3.33 X 10i“ 

3.33 X lO^® 

1071 

2836 

1711.0 

1523 

O 

II 

ps-2000-svd-20 

3.32 X 10^° 

3.32 X 10^° 

3.32 X 10i“ 

3.32 X lO^® 

2484 

4608 

3539.3 

3853 

ps-2000-svd-50 

3.33 X IQio 

3.33 X lOio 

3.33 X 10i“ 

3.33 X lOi® 

950 

4724 

3133.2 

3233 

ps-2000-svd-75 

3.33 X 10^° 

3.33 X 10^° 

3.33 X 10i“ 

3.33 X lO^® 

1359 

2931 

2303.3 

2545 

ps-4000-svd-20 

3.32 X 10^° 

3.33 X 10^° 

3.32 X 10i“ 

3.32 X lO^® 

2160 

4826 

3257.8 

2846 

ps-4000-svd-50 

3.33 X 10^° 

3.33 X 10^° 

3.33 X 10i“ 

3.33 X lO^® 

2469 

5351 

3685.4 

3384 

ps-4000-svd-75 

3.32 X 10^° 

3.33 X 10^° 

3.33 X 10i“ 

3.33 X lO^® 

1119 

2563 

1896.9 

1930 

ps-lOOOO-svd-20 

3.32 X IQio 

3.32 X 10i° 

3.32 X 10i“ 

3.32 X lOi® 

1623 

4650 

2893.0 

2634 

ps-lOOOO-svd-50 

3.33 X 10^° 

3.33 X 10^° 

3.33 X 10i“ 

3.33 X lO^® 

1493 

4866 

3010.0 

2701 

ps-lOOOO-svd-75 

3.32 X 10^° 

3.33 X 10^° 

3.33 X 10i“ 

3.33 X lO^® 

2523 

3985 

3268.6 

3454 

fc = 50 

ps-2000-svd-50 

3.32 X IQio 

3.32 X 10i° 

3.32 X 10i“ 

3.32 X lOi® 

1798 

4881 

3537.6 

3742 

ps-2000-svd-75 

3.32 X 10^° 

3.33 X 10^° 

3.33 X 10i“ 

3.33 X lO^® 

1047 

9855 

4069.4 

3864 

ps-4000-svd-50 

3.32 X 10^° 

3.32 X 10^° 

3.32 X 10i“ 

3.32 X 10^® 

1350 

7193 

4015.3 

4781 

ps-4000-svd-75 

3.32 X IQio 

3.33 X lOio 

3.32 X 10i“ 

3.32 X lOi® 

1382 

6330 

4421.8 

4843 

ps-lOOOO-svd-50 

3.32 X 10^° 

3.32 X 10^° 

3.32 X 10i“ 

3.32 X lO^® 

1138 

7283 

3716.0 

3536 

ps-lOOOO-svd-75 

3.32 X IQio 

3.32 X 10i° 

3.32 X 10i“ 

3.32 X lOi® 

1585 

7521 

4017.2 

2233 


Table 4; Piecy on Random instances with n = 10® and d = 1000 and varying parameters, experiments belong to class II. 
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Figure 4: Results for the Caltechl28 data set. Left side reports quality, right side run times. Variances 
stem from different parameters. 


xlO^ 




Figure 5: Results for a StructuredWithNoise data set with 10® points in 10® dimensions. Left side 
reports quality, right side run times. Variances stem from different parameters. 


Group 


Cost 



Running time 


min 

max 

average 

median 

min 

max 

avg 

med 

k-lO-d-lOOlO-k-lO-svd-15 

5.72 X 10^ 

6.11 X 10’’ 

5.86 X 10’’ 

5.84 X 10’’ 

53 

76 

62.9 

58 

k-lO-d-lOOlO-k-lO-svd-20 

5.49 X 10^ 

5.68 X 10’’ 

5.58 X 10’’ 

5.58 X 10’’ 

56 

94 

69.1 

61 

k-50-d-10050-k-50-svd-75 

5.69 X 10’’ 

5.77 X 10’’ 

5.73 X 10’’ 

5.73 X 10’’ 

78 

78 

78.2 

78 

k-50-d-10050-k-50-svd-100 

5.49 X 10’’ 

5.52 X 10’’ 

5.50 X 10’’ 

5.50 X 10’’ 

79 

80 

78.9 

79 


Table 5: Piecy on LowerBound instances with n = 10000 and a piece size of 2000, experiments belong to class II. 


Piecy-mr. 

Piecy-mr also uses ps, the piece size, as a parameter, as well as svd, the number of dimensions to 
project to. The additional parameter np is the number of pieces that are processed into the same 
BICO instance. 

For CalTechl28, the overhead of piecy-mr does not pay off and it performs worse than piecy. Results 
for this data set ar shown in Figure 4 On the LowerBound test cases, piecy-mr is always slightly faster 
than BICO and comparable to piecy. On the Random instances, piecy-mr is much faster than BICO, 
close to a factor of 2 on most test cases. This is in particular a much better running time than for 
piecy. The fact that Random has both a huge number of points and a high dimension means that the 
strength of piecy-mr shows and is not dominated by the overhead of the computation tree. The study 
of the three StructuredWithNoise data sets confirms this behaviour. In all three cases, the running 
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time of piecy-mr is much faster or at least comparable to BICO with very few exceptions. This effect 
is particularly clear for the largest data set with one million points and a dimension of 1000, showing 
the speed of piecy-mr for large high-dimensional data sets. Figure 5 shows results for this data set. 
Notice that the large variance for piecy and piecy-mr is due to very different parameter choices. The 
best parameter choices yield a significant speed-up, particularly for large values of k. 


Group 


Cost 


Running time 


min 

max average 

median 

min max avg med 

k = 5 


ps-5000-np-5-s-50 

4.28 

X 

loll 

4.29 

X 

loll 

4.28 

X 

IQli 

4.28 

X 

IQll 

469 

504 

483.6 

479 

ps-5000-np-5-s-75 

4.25 

X 

loll 

4.29 

X 

loll 

4.27 

X 

loll 

4.27 

X 

IQll 

491 

514 

501.5 

500 

ps-5000-np-5-s-100 

4.23 

X 

loll 

4.29 

X 

loll 

4.27 

X 

loll 

4.27 

X 

IQll 

502 

548 

525.5 

525 

ps-5000- np-10- s- 5 0 

4.19 

X 

loll 

4.19 

X 

loll 

4.19 

X 

loll 

4.19 

X 

IQll 

365 

420 

394.6 

395 

ps-5000-np-10-s-75 

4.17 

X 

loll 

4.19 

X 

loll 

4.18 

X 

IQll 

4.18 

X 

IQll 

418 

480 

439.0 

430 

ps-5000-np-10-s-100 

4.17 

X 

loll 

4.19 

X 

loll 

4.19 

X 

IQli 

4.19 

X 

IQll 

449 

490 

468.0 

468 

ps-5000-np-15-s-50 

4.17 

X 

loll 

4.19 

X 

loll 

4.18 

X 

loll 

4.19 

X 

IQll 

355 

394 

371.1 

367 

ps-5000-np-15-s-75 

4.16 

X 

loll 

4.18 

X 

loll 

4.17 

X 

loll 

4.17 

X 

IQll 

397 

461 

428.3 

437 

ps-5000-np-15-s-100 

4.16 

X 

loll 

4.19 

X 

loll 

4.17 

X 

loll 

4.17 

X 

IQll 

444 

478 

468.4 

474 

ps-10000-np-5-s-50 

4.16 

X 

loll 

4.18 

X 

loll 

4.17 

X 

IQll 

4.16 

X 

IQll 

528 

591 

566.8 

581 

ps-10000-np-5-s-75 

4.14 

X 

loll 

4.17 

X 

loll 

4.15 

X 

IQli 

4.15 

X 

IQll 

543 

621 

590.0 

602 

ps-lOOOO-np-5-s-lOO 

4.14 

X 

loll 

4.17 

X 

loll 

4.16 

X 

loll 

4.16 

X 

IQll 

535 

647 

596.3 

602 

ps-lOOOO-np-lO-s-50 

4.18 

X 

loll 

4.24 

X 

loll 

4.21 

X 

loll 

4.19 

X 

IQll 

420 

551 

473.4 

475 

ps-lOOOO-np-lO-s-75 

4.18 

X 

loll 

4.19 

X 

loll 

4.18 

X 

loll 

4.18 

X 

IQll 

487 

556 

517.1 

504 

ps-lOOOO-np-lO-s-100 

4.17 

X 

loll 

4.24 

X 

loll 

4.19 

X 

IQll 

4.18 

X 

IQll 

495 

579 

541.3 

555 

ps-10000-np-15-s-50 

4.15 

X 

loll 

4.20 

X 

loll 

4.17 

X 

IQll 

4.17 

X 

IQll 

413 

451 

437.4 

440 

ps-10000-np-15-s-75 

4.15 

X 

loll 

4.18 

X 

loll 

4.17 

X 

loll 

4.17 

X 

IQll 

443 

485 

467.9 

471 

ps-lOOOO-np-15-s-lOO 

4.16 

X 

loll 

4.18 

X 

loll 

4.17 

X 

loll 

4.16 

X 

IQll 

470 

630 

524.3 

508 


fc = 10 


ps-5000-np-5-s-50 

4.01 

X 

10“ 

4.07 

X 

10“ 

4.03 

X 

10“ 

4.02 

X 

10“ 

454 

531 

487.4 

480 

ps-5000-np-5-s-75 

4.01 

X 

10“ 

4.08 

X 

10“ 

4.05 

X 

10“ 

4.04 

X 

10“ 

480 

513 

495.4 

496 

ps-5000-np-5-s-100 

4.00 

X 

10“ 

4.08 

X 

10“ 

4.03 

X 

10“ 

4.03 

X 

10“ 

503 

557 

520.8 

516 

ps-5000-np-10-s-50 

3.94 

X 

10“ 

3.99 

X 

10“ 

3.97 

X 

10“ 

3.96 

X 

10“ 

371 

428 

392.2 

393 

ps-5000-np-10-s-75 

3.92 

X 

10“ 

3.98 

X 

10“ 

3.94 

X 

10“ 

3.94 

X 

10“ 

405 

446 

430.2 

434 

ps-5000-np-10-s-100 

3.92 

X 

10“ 

3.98 

X 

10“ 

3.95 

X 

10“ 

3.95 

X 

10“ 

448 

489 

465.1 

470 

ps-5000-np-15-s-50 

3.90 

X 

10“ 

3.92 

X 

10“ 

3.91 

X 

10“ 

3.91 

X 

10“ 

360 

404 

380.6 

376 

ps-5000-np-15-s-75 

3.88 

X 

10“ 

3.91 

X 

10“ 

3.89 

X 

10“ 

3.88 

X 

10“ 

382 

419 

407.2 

415 

ps-5000-np-15-s-100 

3.87 

X 

10“ 

3.94 

X 

10“ 

3.90 

X 

10“ 

3.90 

X 

10“ 

407 

481 

444.5 

447 

ps-10000-np-5-s-50 

3.87 

X 

10“ 

3.90 

X 

10“ 

3.88 

X 

10“ 

3.89 

X 

10“ 

569 

725 

617.7 

588 

ps-10000-np-5-s-75 

3.84 

X 

10“ 

3.89 

X 

10“ 

3.86 

X 

10“ 

3.86 

X 

10“ 

557 

645 

603.2 

614 

ps-lOOOO-np-5-s-lOO 

3.84 

X 

10“ 

3.89 

X 

10“ 

3.86 

X 

10“ 

3.85 

X 

10“ 

600 

701 

659.9 

674 

ps-lOOOO-np-lO-s-50 

3.89 

X 

10“ 

3.92 

X 

10“ 

3.91 

X 

10“ 

3.90 

X 

10“ 

474 

512 

492.2 

494 

ps-lOOOO-np-lO-s-75 

3.87 

X 

10“ 

3.94 

X 

10“ 

3.90 

X 

10“ 

3.90 

X 

10“ 

475 

541 

494.4 

487 

ps-lOOOO-np-lO-s-100 

3.87 

X 

10“ 

3.90 

X 

10“ 

3.88 

X 

10“ 

3.88 

X 

10“ 

505 

552 

530.1 

536 

ps-10000-np-15-s-50 

3.89 

X 

10“ 

3.91 

X 

10“ 

3.90 

X 

10“ 

3.89 

X 

10“ 

412 

484 

455.9 

465 

ps-10000-np-15-s-75 

3.87 

X 

10“ 

3.93 

X 

10“ 

3.90 

X 

10“ 

3.90 

X 

10“ 

429 

494 

449.1 

432 

ps-lOOOO-np-15-s-lOO 

3.85 

X 

10“ 

3.89 

X 

10“ 

3.87 

X 

10“ 

3.86 

X 

10“ 

489 

609 

540.7 

536 


fc = 50 


ps-5000-np-5-s-50 

3.46 

X 

10“ 

3.53 

X 

10“ 

3.50 

X 

10“ 

3.50 

X 

10“ 

474 

494 

482.2 

479 

ps-5000-np-5-s-75 

3.41 

X 

10“ 

3.50 

X 

10“ 

3.47 

X 

10“ 

3.47 

X 

10“ 

473 

569 

513.0 

505 

ps-5000-np-5-s-100 

3.44 

X 

10“ 

3.54 

X 

10“ 

3.48 

X 

10“ 

3.47 

X 

10“ 

489 

576 

519.8 

518 

ps-5000-np-10-s-50 

3.44 

X 

10“ 

3.52 

X 

10“ 

3.47 

X 

10“ 

3.47 

X 

10“ 

360 

429 

393.5 

383 

ps-5000-np-10-s-75 

3.43 

X 

10“ 

3.49 

X 

10“ 

3.46 

X 

10“ 

3.48 

X 

10“ 

400 

442 

417.2 

428 

ps-5000-np-10-s-100 

3.39 

X 

10“ 

3.43 

X 

10“ 

3.41 

X 

10“ 

3.42 

X 

10“ 

444 

499 

471.0 

494 

ps-5000-np-15-s-50 

3.41 

X 

10“ 

3.46 

X 

10“ 

3.43 

X 

10“ 

3.44 

X 

10“ 

356 

431 

392.0 

419 

ps-5000-np-15-s-75 

3.34 

X 

10“ 

3.41 

X 

10“ 

3.38 

X 

10“ 

3.40 

X 

10“ 

397 

435 

417.3 

432 

ps-5000-np-15-s-100 

3.32 

X 

10“ 

3.40 

X 

10“ 

3.36 

X 

10“ 

3.39 

X 

10“ 

423 

502 

460.6 

484 

ps-10000-np-5-s-50 

3.35 

X 

10“ 

3.42 

X 

10“ 

3.39 

X 

10“ 

3.41 

X 

10“ 

562 

652 

597.8 

627 

ps-10000-np-5-s-75 

3.33 

X 

10“ 

3.39 

X 

10“ 

3.36 

X 

10“ 

3.38 

X 

10“ 

609 

642 

624.2 

639 

ps-lOOOO-np-5-s-lOO 

3.28 

X 

10“ 

3.32 

X 

10“ 

3.30 

X 

10“ 

3.31 

X 

10“ 

579 

661 

615.3 

645 

ps-lOOOO-np-lO-s-50 

3.32 

X 

10“ 

3.38 

X 

10“ 

3.35 

X 

10“ 

3.37 

X 

10“ 

447 

574 

509.1 

560 

ps-lOOOO-np-lO-s-75 

3.30 

X 

10“ 

3.32 

X 

10“ 

3.31 

X 

10“ 

3.32 

X 

10“ 

435 

488 

459.2 

474 

ps-lOOOO-np-lO-s-100 

3.28 

X 

10“ 

3.30 

X 

10“ 

3.29 

X 

10“ 

3.30 

X 

10“ 

516 

607 

546.1 

576 

ps-10000-np-15-s-50 

3.36 

X 

10“ 

3.48 

X 

10“ 

3.43 

X 

10“ 

3.47 

X 

10“ 

431 

452 

437.7 

444 

ps-10000-np-15-s-75 

3.31 

X 

10“ 

3.42 

X 

10“ 

3.36 

X 

10“ 

3.40 

X 

10“ 

431 

508 

479.7 

506 


Table 6: Piecy-mr on Caltechl28, experiments belong to class 1. 
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min max average median min max avg med 


ps-lOOOO-np-15-s-lOO 3.29 x 3.40 x 3.34 x 3.39 x 

489 

553 520.9 

546 

k = 100 





ps-5000-np-5-s-50 

3.28 

X 

loll 

3.37 

X 

loll 

3.32 

X 

IQll 

3.35 

X 

loll 

435 

531 

487.3 

513 

ps-5000-np-5-s-75 

3.20 

X 

loll 

3.30 

X 

loll 

3.26 

X 

IQll 

3.29 

X 

loll 

470 

545 

505.6 

537 

ps-5000-np-5-s-100 

3.22 

X 

loll 

3.31 

X 

loll 

3.26 

X 

IQll 

3.29 

X 

loll 

524 

586 

553.9 

575 

ps-5000-np-10-s-50 

3.27 

X 

loll 

3.32 

X 

loll 

3.30 

X 

IQll 

3.32 

X 

loll 

372 

440 

397.4 

421 

ps-5000-np-10-s-75 

3.23 

X 

loll 

3.26 

X 

loll 

3.24 

X 

IQll 

3.25 

X 

loll 

414 

451 

429.6 

442 

ps-5000-np-10-s-100 

3.17 

X 

loll 

3.22 

X 

loll 

3.21 

X 

IQll 

3.22 

X 

loll 

427 

507 

471.3 

498 

ps-5000-np-15-s-50 

3.21 

X 

loll 

3.29 

X 

loll 

3.24 

X 

IQll 

3.27 

X 

loll 

359 

423 

395.0 

415 

ps-5000-np-15-s-75 

3.20 

X 

loll 

3.26 

X 

loll 

3.22 

X 

IQll 

3.24 

X 

loll 

409 

454 

424.7 

439 

ps-5000-np-15-s-100 

3.16 

X 

loll 

3.22 

X 

loll 

3.19 

X 

IQll 

3.21 

X 

loll 

420 

471 

447.8 

469 

ps-10000-np-5-s-50 

3.18 

X 

loll 

3.25 

X 

loll 

3.21 

X 

IQll 

3.24 

X 

loll 

551 

730 

619.4 

681 

ps-10000-np-5-s-75 

3.15 

X 

loll 

3.19 

X 

loll 

3.17 

X 

IQll 

3.18 

X 

loll 

594 

705 

646.7 

686 

ps-lOOOO-np-5-s-lOO 

3.08 

X 

loll 

3.14 

X 

loll 

3.11 

X 

IQll 

3.13 

X 

loll 

552 

679 

625.6 

663 

ps-lOOOO-np-lO-s-50 

3.15 

X 

loll 

3.18 

X 

loll 

3.16 

X 

IQll 

3.17 

X 

loll 

427 

482 

457.6 

475 

ps-lOOOO-np-lO-s-75 

3.12 

X 

loll 

3.14 

X 

loll 

3.13 

X 

IQll 

3.14 

X 

loll 

458 

538 

490.4 

515 

ps-lOOOO-np-lO-s-100 

3.07 

X 

loll 

3.09 

X 

loll 

3.08 

X 

IQll 

3.09 

X 

loll 

537 

636 

582.4 

608 

ps-10000-np-15-s-50 

3.20 

X 

loll 

3.23 

X 

loll 

3.21 

X 

IQll 

3.22 

X 

loll 

387 

492 

433.0 

468 

ps-10000-np-15-s-75 

3.15 

X 

loll 

3.21 

X 

loll 

3.17 

X 

IQll 

3.19 

X 

loll 

455 

503 

475.5 

491 

ps-lOOOO-np-15-s-lOO 

3.09 

X 

loll 

3.23 

X 

loll 

3.17 

X 

IQll 

3.22 

X 

loll 

488 

594 

533.4 

572 


Table 6: Piecy-mr on Caltechl28 (continued), experiments belong to class I. 


Group 


Cost 


Running time 


min 

max average 

median 

min max avg med 

k = 5 

n-5000-d-1000 


k-10-ps-2000-svd-10 

1.70 

X 

109 

1.70 

X 

10® 

1.70 

X 

10® 

1.70 

X 

10® 

391 

1018 

703.4 

644 

k-10-ps-2000-svd-20 

1.70 

X 

10® 

1.70 

X 

10® 

1.70 

X 

10® 

1.70 

X 

10® 

442 

1155 

767.6 

736 

k-10-ps-2000-svd-50 

1.70 

X 

10® 

1.70 

X 

10® 

1.70 

X 

10® 

1.70 

X 

10® 

348 

1018 

554.4 

489 

k-10-ps-2000-svd-70 

1.70 

X 

10® 

1.70 

X 

10® 

1.70 

X 

10® 

1.70 

X 

10® 

334 

823 

534.6 

508 

k-20-ps-4000-svd-10 

1.70 

X 

10® 

1.70 

X 

10® 

1.70 

X 

10® 

1.70 

X 

10® 

439 

948 

698.2 

676 

k-20-ps-4000-svd-20 

1.70 

X 

10® 

1.70 

X 

10® 

1.70 

X 

10® 

1.70 

X 

10® 

574 

1670 

937.9 

834 

k-20-ps-4000-svd-50 

1.70 

X 

10® 

1.70 

X 

10® 

1.70 

X 

10® 

1.70 

X 

10® 

417 

1049 

720.7 

675 

k-20-ps-4000-svd-70 

1.70 

X 

10® 

1.70 

X 

10® 

1.70 

X 

10® 

1.70 

X 

10® 

443 

934 

606.0 

553 

k-50-ps-10000-svd-10 

1.69 

X 

10® 

1.69 

X 

10® 

1.69 

X 

10® 

1.69 

X 

10® 

402 

647 

478.2 

461 

k-50-ps-10000-svd-20 

1.69 

X 

10® 

1.70 

X 

10® 

1.70 

X 

10® 

1.70 

X 

10® 

646 

2021 

996.5 

978 

k-50-ps-10000-svd-50 

1.69 

X 

10® 

1.70 

X 

10® 

1.70 

X 

10® 

1.70 

X 

10® 

612 

4183 

1823.8 

1741 

k-50-ps-10000-svd-70 

1.69 

X 

10® 

1.70 

X 

10® 

1.70 

X 

10® 

1.70 

X 

10® 

510 

1968 

1246.0 

1206 

k-100-ps-20000-svd-10 

1.69 

X 

10® 

1.69 

X 

10® 

1.69 

X 

10® 

1.69 

X 

10® 

341 

662 

459.6 

437 

k-100-ps-20000-svd-20 

1.69 

X 

10® 

1.69 

X 

10® 

1.69 

X 

10® 

1.69 

X 

10® 

473 

765 

593.8 

595 

k-100-ps-20000-svd-50 

1.69 

X 

10® 

1.69 

X 

10® 

1.69 

X 

10® 

1.69 

X 

10® 

719 

3507 

1597.3 

1528 

k-100-ps-20000-svd-70 

1.69 

X 

10® 

1.69 

X 

10® 

1.69 

X 

10® 

1.69 

X 

10® 

704 

5198 

1654.6 

1340 

n-lOOOO-d-500 

k-10-ps-2000-svd-10 

3.35 

X 

10® 

3.36 

X 

10® 

3.36 

X 

10® 

3.36 

X 

10® 

453 

1130 

699.0 

565 

k-10-ps-2000-svd-20 

3.35 

X 

10® 

3.37 

X 

10® 

3.36 

X 

10® 

3.36 

X 

10® 

475 

1237 

734.9 

694 

k-10-ps-2000-svd-50 

3.36 

X 

10® 

3.37 

X 

10® 

3.36 

X 

10® 

3.36 

X 

10® 

342 

818 

528.3 

554 

k-10-ps-2000-svd-70 

3.36 

X 

10® 

3.37 

X 

10® 

3.36 

X 

10® 

3.36 

X 

10® 

347 

1069 

526.9 

480 

k-20-ps-4000-svd-10 

3.34 

X 

10® 

3.35 

X 

10® 

3.35 

X 

10® 

3.35 

X 

10® 

615 

1298 

993.0 

1023 

k-20-ps-4000-svd-20 

3.34 

X 

10® 

3.36 

X 

10® 

3.35 

X 

10® 

3.35 

X 

10® 

549 

2391 

1425.2 

1315 

k-20-ps-4000-svd-50 

3.36 

X 

10® 

3.37 

X 

10® 

3.36 

X 

10® 

3.36 

X 

10® 

424 

2032 

1058.1 

968 

k-20-ps-4000-svd-70 

3.35 

X 

10® 

3.37 

X 

10® 

3.36 

X 

10® 

3.36 

X 

10® 

400 

1972 

1027.7 

1040 

k-50-ps-10000-svd-10 

3.33 

X 

10® 

3.33 

X 

10® 

3.33 

X 

10® 

3.33 

X 

10® 

484 

1496 

785.7 

724 

k-50-ps-10000-svd-20 

3.33 

X 

10® 

3.35 

X 

10® 

3.34 

X 

10® 

3.34 

X 

10® 

747 

2996 

1567.3 

1469 

k-50-ps-10000-svd-50 

3.34 

X 

10® 

3.37 

X 

10® 

3.35 

X 

10® 

3.35 

X 

10® 

895 

4015 

2259.5 

2359 

k-50-ps-10000-svd-70 

3.34 

X 

10® 

3.37 

X 

10® 

3.35 

X 

10® 

3.35 

X 

10® 

560 

4079 

2100.6 

2244 

k-100-ps-20000-svd-10 

3.32 

X 

10® 

3.32 

X 

10® 

3.32 

X 

10® 

3.32 

X 

10® 

390 

579 

483.4 

494 

k-100-ps-20000-svd-20 

3.32 

X 

10® 

3.32 

X 

10® 

3.32 

X 

10® 

3.32 

X 

10® 

515 

1876 

1035.3 

971 

k-100-ps-20000-svd-50 

3.32 

X 

10® 

3.33 

X 

10® 

3.33 

X 

10® 

3.33 

X 

10® 

803 

8613 

2701.5 

2205 

k-100-ps-20000-svd-70 

3.32 

X 

10® 

3.37 

X 

10® 

3.33 

X 

10® 

3.33 

X 

10® 

1161 

6735 

3093.9 

2126 


n-lOOOO-d-1000 


Table 7: Piecy on StructuredWithNoise, experiments belong to class I. 
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min max average median min max avg med 


k-10-ps-2000-svd-10 

3.40 

X 

10® 

3.41 

X 

10® 

3.40 

X 

10® 

3.40 

X 

10® 

912 

2544 

1728.3 

1777 

k-10-ps-2000-svd-20 

3.40 

X 

10® 

3.41 

X 

10® 

3.41 

X 

10® 

3.41 

X 

10® 

862 

2752 

1703.0 

1575 

k-10-ps-2000-svd-50 

3.40 

X 

10® 

3.41 

X 

10® 

3.41 

X 

10® 

3.41 

X 

10® 

661 

1429 

931.0 

847 

k-10-ps-2000-svd-70 

3.41 

X 

10® 

3.41 

X 

10® 

3.41 

X 

10® 

3.41 

X 

10® 

686 

1852 

1062.7 

1035 

k-20-ps-4000-svd-10 

3.40 

X 

10® 

3.40 

X 

10® 

3.40 

X 

10® 

3.40 

X 

10® 

1146 

3167 

2145.3 

2193 

k-20-ps-4000-svd-20 

3.40 

X 

10® 

3.40 

X 

10® 

3.40 

X 

10® 

3.40 

X 

10® 

1164 

4652 

2434.6 

2194 

k-20-ps-4000-svd-50 

3.40 

X 

10® 

3.41 

X 

10® 

3.41 

X 

10® 

3.41 

X 

10® 

770 

3011 

1744.1 

1965 

k-20-ps-4000-svd-70 

3.40 

X 

10® 

3.41 

X 

10® 

3.41 

X 

10® 

3.41 

X 

10® 

741 

1648 

1100.2 

998 

k-50-ps-10000-svd-10 

3.39 

X 

10® 

3.39 

X 

10® 

3.39 

X 

10® 

3.39 

X 

10® 

964 

2450 

1521.5 

1503 

k-50-ps-10000-svd-20 

3.39 

X 

10® 

3.40 

X 

10® 

3.40 

X 

10® 

3.40 

X 

10® 

1967 

5791 

2997.4 

2754 

k-50-ps-10000-svd-50 

3.39 

X 

10® 

3.41 

X 

10® 

3.40 

X 

10® 

3.40 

X 

10® 

1618 

9826 

4041.0 

3011 

k-50-ps-10000-svd-70 

3.40 

X 

10® 

3.41 

X 

10® 

3.40 

X 

10® 

3.40 

X 

10® 

1302 

8497 

4512.1 

4474 

k-100-ps-20000-svd-10 

3.38 

X 

10® 

3.38 

X 

10® 

3.38 

X 

10® 

3.38 

X 

10® 

790 

1350 

954.3 

913 

k-100-ps-20000-svd-20 

3.38 

X 

10® 

3.38 

X 

10® 

3.38 

X 

10® 

3.38 

X 

10® 

1056 

5533 

2407.1 

2209 

k-100-ps-20000-svd-50 

3.38 

X 

10® 

3.39 

X 

10® 

3.39 

X 

10® 

3.38 

X 

10® 

1692 

12 388 

5438.9 

4327 

k-100-ps-20000-svd-70 

3.38 

X 

10® 

3.39 

X 

10® 

3.39 

X 

10® 

3.39 

X 

10® 

1311 

13 853 

3896.1 

2904 


Table 7: Piecy on StructuredWithNoise (continued), experiments belong to class I. 


Group 






Cost 







Running time 



min 

max 

average 

median 

min 

max 

avg 

med 

k = 10, piece 

size 2000 














np-lO-svd-lO 

3.33 

X 

lOi® 

3.33 

X 

lOlo 

3.33 

X 

lOlo 

3.33 

X 

lOlo 

768 

863 

817.4 

820 

np-lO-svd-20 

3.33 

X 

lOi® 

3.33 

X 

lOi® 

3.33 

X 

lOlo 

3.33 

X 

lOlo 

902 

953 

917.4 

906 

np-lO-svd-50 

3.33 

X 

lOi® 

3.33 

X 

lOi® 

3.33 

X 

lOlo 

3.33 

X 

lOlo 

873 

925 

902.1 

906 

np-lO-svd-75 

3.33 

X 

lOi® 

3.33 

X 

lOlo 

3.33 

X 

lOlo 

3.33 

X 

lOlo 

907 

959 

933.4 

942 

np-15-svd-lO 

3.33 

X 

lOi® 

3.33 

X 

lOlo 

3.33 

X 

lOlo 

3.33 

X 

lOlo 

819 

902 

879.8 

894 

np-15-svd-20 

3.33 

X 

lOi® 

3.33 

X 

lOlo 

3.33 

X 

lOio 

3.33 

X 

lOio 

880 

949 

905.7 

895 

np-15-svd-50 

3.33 

X 

lOi® 

3.33 

X 

lOi® 

3.33 

X 

lOio 

3.33 

X 

lOio 

870 

912 

887.1 

882 

np-15-svd-75 

3.33 

X 

lOi® 

3.33 

X 

lOio 

3.33 

X 

lOio 

3.33 

X 

lOio 

867 

953 

897.7 

888 

np-50-svd-10 

3.33 

X 

lOi® 

3.33 

X 

lOio 

3.33 

X 

lOio 

3.33 

X 

lOlo 

929 

1064 

997.7 

997 

np-50-svd-20 

3.33 

X 

lOi® 

3.33 

X 

lOio 

3.33 

X 

lOlo 

3.33 

X 

lOlo 

989 

1094 

1023.9 

999 

np-50-svd-50 

3.33 

X 

lOi® 

3.33 

X 

lOi® 

3.33 

X 

lOlo 

3.33 

X 

lOlo 

954 

1094 

1004.0 

980 

np-50-svd-75 

3.33 

X 

lOi® 

3.33 

X 

lOi® 

3.33 

X 

lOlo 

3.33 

X 

lOlo 

1032 

1188 

1107.9 

1120 

k = 20, piece 

size 4000 














np-lO-svd-50 

3.33 

X 

lOi® 

3.33 

X 

lOio 

3.33 

X 

lOlo 

3.33 

X 

lOlo 

974 

1222 

1136.4 

1186 

np-lO-svd-75 

3.33 

X 

lOi® 

3.33 

X 

lOlo 

3.33 

X 

lOlo 

3.33 

X 

lOlo 

1074 

1277 

1181.9 

1160 

np-15-svd-lO 

3.32 

X 

lOi® 

3.32 

X 

lOlo 

3.32 

X 

lOlo 

3.32 

X 

lOlo 

862 

999 

919.0 

887 

np-15-svd-20 

3.32 

X 

lOi® 

3.32 

X 

lOi® 

3.32 

X 

lOlo 

3.32 

X 

lOlo 

1106 

1202 

1158.3 

1170 

np-15-svd-50 

3.33 

X 

lOi® 

3.33 

X 

lOi® 

3.33 

X 

lOlo 

3.33 

X 

lOio 

1014 

1249 

1121.1 

1105 

np-15-svd-75 

3.33 

X 

lOi® 

3.33 

X 

lOi® 

3.33 

X 

lOio 

3.33 

X 

lOio 

1118 

1262 

1177.1 

1164 

np-50-svd-10 

3.32 

X 

lOi® 

3.32 

X 

lOlo 

3.32 

X 

lOio 

3.32 

X 

lOio 

1238 

1666 

1408.0 

1383 

np-50-svd-20 

3.32 

X 

lOi® 

3.32 

X 

lOlo 

3.32 

X 

lOio 

3.32 

X 

lOio 

1033 

1761 

1338.5 

1225 

np-50-svd-50 

3.32 

X 

lOi® 

3.33 

X 

lOi® 

3.32 

X 

lOio 

3.32 

X 

lOio 

1500 

1805 

1638.1 

1650 

np-50-svd-75 

3.32 

X 

lOi® 

3.33 

X 

lOi® 

3.33 

X 

lOio 

3.33 

X 

lOio 

1256 

1843 

1450.8 

1358 


Table 8: Piecy-mr on random instances with d = 1000 and n = 10® (continued), experiments belong to class II. 


Group 


Cost 



Running time 


min 

max 

average 

median 

min 

max 

avg 

med 

Group 

Min cost 

Max cost 

Avg cost 

Median cost 

Min time 

Max time 

Avg time 

Median time 

svd-15 

5.91 X 10^ 

5.91 X 10^ 

5.91 X 10^ 

5.91 X 10'^ 

58 

68 

62.9 

63 

svd-20 

5.62 X 10^ 

5.62 X 10^ 

5.62 X 10^ 

5.62 X 10'^ 

62 

89 

70.0 

66 


Table 9: Piecy-mr on LowerBound instances with d = 10010, n = 10000 and m, k = 10. The piece size is fixed to 2000, the number 
of pieces is fixed to 10. Experiments belong to class II. 
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Group 


Cost 


median 


Running time 


min max average 


min max avg med 


ps-10000-np-5-svd-75 3.34 x 10® 
ps-lOOOO-np-lO-svd-75 3.34 X 10® 
ps-lOOOO-np-lO-svd-50 3.33 x 10® 


3.35 X 10® 3.34 x 10® 3.34 x 

3.35 X 10® 3.34 x 10® 3.34 x 

3.34 X 10® 3.34 x 10® 3.34 x 


10® 

663 

1086 

843.8 

829 

10® 

453 

769 

632.2 

629 

10® 

426 

873 

627.0 

609 


Table 10: Piecy-MR on StructuredWithNoise with n = 1000000, d = 500 and k = 50. Experiments belong to class III. 


Group Cost Running time 

min max average median min max avg med 

n-5000-d-1000 

k-10-ps-2000-np-10-svd-10 1.70 X 10® 1.70 X 10® 1.70 x 10® 1.70 x 10® 346 397 375.9 382 


k-10-ps-2000-np-10-svd-20 1.70 X 10® 1.70 X 10® 
k-10-ps-2000-np-10-svd-50 1.70 X 10® 1.70 x 10® 
k-10-ps-2000-np-10-svd-70 1.70 x 10® 1.70 x 10® 
k-10-ps-2000-np-15-svd-10 1.70 X 10® 1.70 X 10® 
k-10-ps-2000-np-15-svd-20 1.70 X 10® 1.70 X 10® 
k-10-ps-2000-np-15-svd-50 1.70 X 10® 1.70 X 10® 
k-10-ps-2000-np-15-svd-70 1.70 x 10® 1.70 x 10® 
k-10-ps-2000-np-50-svd-10 1.70 X 10® 1.70 X 10® 
k-10-ps-2000-np-50-svd-20 1.70 X 10® 1.70 X 10® 
k-10-ps-2000-np-50-svd-50 1.70 X 10® 1.70 x 10® 
k-10-ps-2000-np-50-svd-70 1.70 x 10® 1.70 x 10® 
k-20-ps-4000-np-10-svd-10 1.70 X 10® 1.70 x 10® 
k-20-ps-4000-np-10-svd-20 1.70 X 10® 1.70 x 10® 
k-20-ps-4000-np-10-svd-50 1.70 X 10® 1.70 X 10® 
k-20-ps-4000-np-10-svd-70 1.70 X 10® 1.70 x 10® 
k-20-ps-4000-np-15-svd-10 1.70 X 10® 1.70 X 10® 
k-20-ps-4000-np-15-svd-20 1.70 X 10® 1.70 x 10® 
k-20-ps-4000-np-15-svd-50 1.70 X 10® 1.70 X 10® 
k-20-ps-4000-np-15-svd-70 1.70 x 10® 1.70 x 10® 
k-20-ps-4000-np-50-svd-10 1.70 X 10® 1.70 X 10® 
k-20-ps-4000-np-50-svd-20 1.70 X 10® 1.70 X 10® 
k-20-ps-4000-np-50-svd-50 1.70 X 10® 1.70 X 10® 
k-20-ps-4000-np-50-svd-70 1.70 x 10® 1.70 x 10® 
k-50-ps-10000-np-10-svd-10 1.70 X 10® 1.70 x 10® 
k-50-ps-10000-np-10-svd-20 1.69 X 10® 1.70 x 10® 
k-50-ps-10000-np-10-svd-50 1.69 x 10® 1.70 x 10® 
k-50-ps-10000-np-10-svd-70 1.70 x 10® 1.70 x 10® 
k-50-ps-10000-np-15-svd-10 1.69 X 10® 1.70 x 10® 
k-50-ps-10000-np-15-svd-20 1.69 X 10® 1.70 x 10® 
k-50-ps-10000-np-15-svd-50 1.70 x 10® 1.70 x 10® 
k-50-ps-10000-np-15-svd-70 1.70 x 10® 1.70 x 10® 
k-50-ps-10000-np-50-svd-10 1.70 x 10® 1.70 x 10® 
k-50-ps-10000-np-50-svd-20 1.70 x 10® 1.70 x 10® 
k-50-ps-10000-np-50-svd-50 1.70 x 10® 1.70 x 10® 
k-50-ps-10000-np-50-svd-70 1.70 x 10® 1.70 x 10® 
k-100-ps-20000-np-10-svd-10 1.69 X 10® 1.69 x 10® 
k-100-ps-20000-np-10-svd-20 1.69 X 10® 1.69 X 10® 
k-100-ps-20000-np-10-svd-50 1.69 X 10® 1.69 x 10® 
k-100-ps-20000-np-10-svd-70 1.69 X 10® 1.70 x 10® 
k-100-ps-20000-np-15-svd-10 1.69 X 10® 1.69 x 10® 
k-100-ps-20000-np-15-svd-20 1.69 X 10® 1.69 X 10® 
k-100-ps-20000-np-15-svd-50 1.69 X 10® 1.69 x 10® 
k-100-ps-20000-np-15-svd-70 1.69 X 10® 1.70 x 10® 
k-100-ps-20000-np-50-svd-10 1.69 X 10® 1.69 x 10® 
k-100-ps-20000-np-50-svd-20 1.69 X 10® 1.69 x 10® 
k-100-ps-20000-np-50-svd-50 1.69 X 10® 1.69 x 10® 
k-100-ps-20000-np-50-svd-70 1.69 X 10® 1.70 x 10® 


1.70 X 10® 

1.70 X 10® 

401 

456 

422.4 

422 

1.70 X 10® 

1.70 X 10® 

410 

482 

434.3 

430 

1.70 X 10® 

1.70 X 10® 

431 

495 

458.3 

463 

1.70 X 10® 

1.70 X 10® 

365 

450 

397.0 

396 

1.70 X 10® 

1.70 X 10® 

399 

443 

423.3 

431 

1.70 X 10® 

1.70 X 10® 

381 

452 

408.6 

407 

1.70 X 10® 

1.70 X 10® 

399 

454 

422.0 

427 

1.70 X 10® 

1.70 X 10® 

369 

540 

441.2 

437 

1.70 X 10® 

1.70 X 10® 

394 

517 

458.2 

487 

1.70 X 10® 

1.70 X 10® 

386 

519 

449.7 

458 

1.70 X 10® 

1.70 X 10® 

379 

529 

446.2 

460 

1.70 X 10® 

1.70 X 10® 

344 

410 

365.0 

365 

1.70 X 10® 

1.70 X 10® 

365 

540 

449.2 

464 

1.70 X 10® 

1.70 X 10® 

438 

559 

506.3 

519 

1.70 X 10® 

1.70 X 10® 

494 

585 

536.1 

545 

1.70 X 10® 

1.70 X 10® 

343 

430 

382.4 

387 

1.70 X 10® 

1.70 X 10® 

420 

513 

473.2 

482 

1.70 X 10® 

1.70 X 10® 

479 

575 

512.5 

512 

1.70 X 10® 

1.70 X 10® 

475 

532 

502.0 

508 

1.70 X 10® 

1.70 X 10® 

457 

629 

518.4 

529 

1.70 X 10® 

1.70 X 10® 

416 

719 

518.2 

518 

1.70 X 10® 

1.70 X 10® 

387 

648 

462.4 

459 

1.70 X 10® 

1.70 X 10® 

386 

626 

499.3 

531 

1.70 X 10® 

1.70 X 10® 

369 

546 

425.5 

420 

1.70 X 10® 

1.70 X 10® 

502 

934 

643.9 

642 

1.69 X 10® 

1.69 X 10® 

558 

1112 

772.5 

769 

1.70 X 10® 

1.70 X 10® 

571 

883 

706.7 

713 

1.69 X 10® 

1.69 X 10® 

371 

508 

432.8 

449 

1.69 X 10® 

1.69 X 10® 

595 

1108 

779.6 

811 

1.70 X 10® 

1.70 X 10® 

568 

1195 

758.1 

750 

1.70 X 10® 

1.70 X 10® 

647 

1345 

998.4 

1074 

1.70 X 10® 

1.70 X 10® 

417 

1671 

1035.1 

1077 

1.70 X 10® 

1.70 X 10® 

510 

1723 

1235.6 

1412 

1.70 X 10® 

1.70 X 10® 

556 

2039 

1172.7 

1322 

1.70 X 10® 

1.70 X 10® 

540 

3294 

1575.1 

1484 

1.69 X 10® 

1.69 X 10® 

339 

480 

416.3 

425 

1.69 X 10® 

1.69 X 10® 

444 

947 

703.7 

736 

1.69 X 10® 

1.69 X 10® 

690 

1933 

1217.5 

1234 

1.69 X 10® 

1.69 X 10® 

1084 

1751 

1433.1 

1539 

1.69 X 10® 

1.69 X 10® 

356 

550 

461.5 

476 

1.69 X 10® 

1.69 X 10® 

728 

1392 

1064.5 

1108 

1.69 X 10® 

1.69 X 10® 

812 

2401 

1488.3 

1571 

1.69 X 10® 

1.69 X 10® 

954 

2004 

1719.4 

1907 

1.69 X 10® 

1.69 X 10® 

400 

759 

529.2 

475 

1.69 X 10® 

1.69 X 10® 

750 

1805 

1262.4 

1346 

1.69 X 10® 

1.69 X 10® 

525 

2593 

1482.6 

1874 

1.69 X 10® 

1.69 X 10® 

393 

4402 

2053.8 

2213 


n-lOOOO-d-500 


k-10-ps-2000-np-10-svd-10 

3.36 X 10® 

3.36 X 10® 

3.36 X 10® 

3.36 X 10® 

357 

393 

374.5 

375 

k-10-ps-2000-np-10-svd-20 

3.35 X 10® 

3.36 X 10® 

3.36 X 10® 

3.36 X 10® 

407 

429 

415.1 

418 

k-10-ps-2000-np-10-svd-50 

3.35 X 10® 

3.36 X 10® 

3.36 X 10® 

3.36 X 10® 

410 

468 

436.9 

441 


Table 11: Piecy-MR on StructuredWithNoise. Experiments belong to class 11. 
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min max average median min max avg med 

k-10-ps-2000-np-10-svd-70 3.36 x 10® 3.36 x 10® 3.36 x 10® 3.36 x 10® 419 477 449.1 448 

k-10-ps-2000-np-15-svd-10 3.35 X 10® 3.36 x 10® 3.35 x 10® 3.35 x 10® 366 421 398.4 407 

k-10-ps-2000-np-15-svd-20 3.35 X 10® 3.36 x 10® 3.36 x 10® 3.36 x 10® 377 436 401.0 403 

k-10-ps-2000-np-15-svd-50 3.36 X 10® 3.36 x 10® 3.36 x 10® 3.36 x 10® 404 449 426.8 433 

k-10-ps-2000-np-15-svd-70 3.36 x 10® 3.36 x 10® 3.36 x 10® 3.36 x 10® 427 469 446.7 450 

k-10-ps-2000-np-50-svd-10 3.36 X 10® 3.36 x 10® 3.36 x 10® 3.36 x 10® 392 493 437.8 444 

k-10-ps-2000-np-50-svd-20 3.35 X 10® 3.36 x 10® 3.36 x 10® 3.36 x 10® 388 501 440.8 457 

k-10-ps-2000-np-50-svd-50 3.36 X 10® 3.36 x 10® 3.36 x 10® 3.36 x 10® 419 547 475.1 478 

k-10-ps-2000-np-50-svd-70 3.36 x 10® 3.37 x 10® 3.36 x 10® 3.36 x 10® 427 534 490.7 512 

k-20-ps-4000-np-10-svd-10 3.35 X 10® 3.35 x 10® 3.35 x 10® 3.35 x 10® 351 397 367.1 370 

k-20-ps-4000-np-10-svd-20 3.34 X 10® 3.35 X 10® 3.35 x 10® 3.35 x 10® 442 538 480.6 490 

k-20-ps-4000-np-10-svd-50 3.35 X 10® 3.36 x 10® 3.35 x 10® 3.36 x 10® 479 572 528.8 543 

k-20-ps-4000-np-10-svd-70 3.35 x 10® 3.36 x 10® 3.36 x 10® 3.36 x 10® 540 608 564.4 568 

k-20-ps-4000-np-15-svd-10 3.34 X 10® 3.35 X 10® 3.34 x 10® 3.34 x 10® 378 464 423.7 431 

k-20-ps-4000-np-15-svd-20 3.35 X 10® 3.35 X 10® 3.35 x 10® 3.35 x 10® 473 535 504.0 508 

k-20-ps-4000-np-15-svd-50 3.35 X 10® 3.36 x 10® 3.36 x 10® 3.36 x 10® 428 541 489.8 499 

k-20-ps-4000-np-15-svd-70 3.36 x 10® 3.36 x 10® 3.36 x 10® 3.36 x 10® 462 595 539.8 573 

k-20-ps-4000-np-50-svd-10 3.35 X 10® 3.35 x 10® 3.35 x 10® 3.35 x 10® 463 690 561.3 561 

k-20-ps-4000-np-50-svd-20 3.35 X 10® 3.35 x 10® 3.35 x 10® 3.35 x 10® 395 698 566.8 620 

k-20-ps-4000-np-50-svd-50 3.35 X 10® 3.36 x 10® 3.35 x 10® 3.35 x 10® 520 959 703.2 725 

k-20-ps-4000-np-50-svd-70 3.35 x 10® 3.36 x 10® 3.36 x 10® 3.36 x 10® 567 960 698.1 686 

k-50-ps-10000-np-10-svd-10 3.33 x 10® 3.35 x 10® 3.34 x 10® 3.35 x 10® 353 502 407.5 398 

k-50-ps-10000-np-10-svd-20 3.33 X 10® 3.35 x 10® 3.34 x 10® 3.34 x 10® 528 922 663.0 644 

k-50-ps-10000-np-10-svd-50 3.33 x 10® 3.34 x 10® 3.34 x 10® 3.34 x 10® 625 1158 815.1 788 

k-50-ps-10000-np-10-svd-70 3.34 x 10® 3.35 x 10® 3.34 x 10® 3.34 x 10® 641 1116 905.3 948 

k-50-ps-10000-np-15-svd-10 3.33 x 10® 3.33 x 10® 3.33 x 10® 3.33 x 10® 389 503 434.6 430 

k-50-ps-10000-np-15-svd-20 3.33 X 10® 3.34 x 10® 3.33 x 10® 3.33 x 10® 544 956 692.6 726 

k-50-ps-10000-np-15-svd-50 3.34 X 10® 3.35 x 10® 3.34 x 10® 3.35 x 10® 747 1510 1075.8 1099 

k-50-ps-10000-np-15-svd-70 3.34 x 10® 3.36 x 10® 3.35 x 10® 3.35 x 10® 854 1487 1029.0 985 

k-50-ps-10000-np-50-svd-10 3.34 X 10® 3.34 x 10® 3.34 x 10® 3.34 x 10® 583 899 706.5 737 

k-50-ps-10000-np-50-svd-20 3.33 x 10® 3.34 x 10® 3.34 x 10® 3.34 x 10® 685 1797 1223.7 1346 

k-50-ps-10000-np-50-svd-50 3.34 x 10® 3.35 x 10® 3.34 x 10® 3.34 x 10® 694 3635 1562.7 1628 

k-50-ps-10000-np-50-svd-70 3.34 x 10® 3.35 x 10® 3.35 x 10® 3.35 x 10® 677 2375 1499.1 1566 

k-100-ps-20000-np-10-svd-10 3.32 X 10® 3.33 x 10® 3.33 x 10® 3.33 x 10® 374 617 445.7 440 

k-100-ps-20000-np-10-svd-20 3.33 X 10® 3.33 x 10® 3.33 x 10® 3.33 x 10® 520 860 682.1 726 

k-100-ps-20000-np-10-svd-50 3.32 X 10® 3.32 X 10® 3.32 x 10® 3.32 x 10® 706 1835 1175.4 1358 

k-100-ps-20000-np-10-svd-70 3.32 X 10® 3.33 x 10® 3.33 x 10® 3.33 x 10® 740 2379 1199.0 1185 

k-100-ps-20000-np-15-svd-10 3.32 X 10® 3.32 X 10® 3.32 x 10® 3.32 x 10® 393 596 483.4 501 

k-100-ps-20000-np-15-svd-20 3.32 X 10® 3.32 X 10® 3.32 x 10® 3.32 x 10® 454 1138 815.9 863 

k-100-ps-20000-np-15-svd-50 3.32 X 10® 3.34 X 10® 3.33 x 10® 3.33 x 10® 706 2377 1229.5 1093 

k-100-ps-20000-np-15-svd-70 3.33 x 10® 3.34 x 10® 3.33 x 10® 3.33 x 10® 1066 2000 1490.9 1559 

k-100-ps-20000-np-50-svd-10 3.34 X 10® 3.34 X 10® 3.34 x 10® 3.34 x 10® 431 1816 938.0 1007 

k-100-ps-20000-np-50-svd-20 3.33 X 10® 3.33 x 10® 3.33 x 10® 3.33 x 10® 746 3256 1520.7 1363 

k-100-ps-20000-np-50-svd-50 3.33 x 10® 3.33 x 10® 3.33 x 10® 3.33 x 10® 485 4337 1847.7 1414 

k-100-ps-20000-np-50-svd-70 3.33 x 10® 3.34 x 10® 3.33 x 10® 3.34 x 10® 779 3507 1895.5 1815 

n-lOOOO-d-1000 


k-10-ps-2000-np-10-svd-10 

k-10-ps-2000-np-10-svd-20 

k-10-ps-2000-np-10-svd-50 

k-10-ps-2000-np-10-svd-70 

k-10-ps-2000-np-15-svd-10 

k-10-ps-2000-np-15-svd-20 

k-10-ps-2000-np-15-svd-50 

k-10-ps-2000-np-15-svd-70 

k-10-ps-2000-np-50-svd-10 

k-10-ps-2000-np-50-svd-20 

k-10-ps-2000-np-50-svd-50 

k-10-ps-2000-np-50-svd-70 

k-20-ps-4000-np-10-svd-10 

k-20-ps-4000-np-10-svd-20 

k-20-ps-4000-np-10-svd-50 

k-20-ps-4000-np-10-svd-70 

k-20-ps-4000-np-15-svd-10 

k-20-ps-4000-np-15-svd-20 

k-20-ps-4000-np-15-svd-50 

k-20-ps-4000-np-15-svd-70 

k-20-ps-4000-np-50-svd-10 

k-20-ps-4000-np-50-svd-20 


3.40 X 10® 3.41 X 10® 

3.40 X 10® 3.41 X 10® 

3.40 X 10® 3.41 X 10® 

3.40 X 10® 3.41 X 10® 

3.40 X 10® 3.41 X 10® 

3.40 X 10® 3.40 X 10® 

3.41 X 10® 3.41 X 10® 

3.41 X 10® 3.41 X 10® 

3.40 X 10® 3.41 X 10® 

3.40 X 10® 3.41 X 10® 

3.40 X 10® 3.41 X 10® 

3.40 X 10® 3.41 X 10® 

3.40 X 10® 3.40 X 10® 

3.40 X 10® 3.40 X 10® 

3.40 X 10® 3.41 X 10® 

3.40 X 10® 3.41 X 10® 

3.40 X 10® 3.40 X 10® 

3.40 X 10® 3.40 X 10® 

3.40 X 10® 3.41 X 10® 

3.40 X 10® 3.41 X 10® 

3.40 X 10® 3.40 X 10® 

3.40 X 10® 3.40 X 10® 


3.40 X 10® 3.41 X 10® 

3.40 X 10® 3.40 X 10® 

3.41 X 10® 3.41 X 10® 

3.41 X 10® 3.41 X 10® 

3.40 X 10® 3.40 X 10® 

3.40 X 10® 3.40 X 10® 

3.41 X 10® 3.41 X 10® 

3.41 X 10® 3.41 X 10® 

3.40 X 10® 3.40 X 10® 

3.40 X 10® 3.40 X 10® 

3.41 X 10® 3.41 X 10® 

3.41 X 10® 3.41 X 10® 

3.40 X 10® 3.40 X 10® 

3.40 X 10® 3.40 X 10® 

3.40 X 10® 3.40 X 10® 

3.41 X 10® 3.41 X 10® 

3.40 X 10® 3.40 X 10® 

3.40 X 10® 3.40 X 10® 

3.40 X 10® 3.40 X 10® 

3.41 X 10® 3.41 X 10® 

3.40 X 10® 3.40 X 10® 

3.40 X 10® 3.40 X 10® 


733 

816 

773.4 

784 

805 

902 

846.6 

847 

831 

934 

879.2 

885 

840 

950 

883.7 

884 

740 

901 

811.2 

823 

800 

879 

845.1 

861 

803 

921 

853.3 

858 

855 

941 

894.7 

903 

895 

1082 

952.4 

948 

895 

1101 

963.4 

948 

967 

1095 

1022.6 

1028 

894 

1163 

1019.4 

1028 

719 

817 

753.9 

755 

940 

1048 

1000.9 

1026 

968 

1150 

1059.6 

1073 

1052 

1140 

1099.6 

1116 

742 

914 

808.4 

820 

915 

1180 

1006.0 

984 

983 

1118 

1028.5 

1012 

1005 

1125 

1055.5 

1059 

1000 

1280 

1093.6 

1070 

1079 

1702 

1253.9 

1203 


Table 11: Piecy-MR on StructuredWithNoise. Experiments belong to class 11. 
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min max average median min max avg med 


k-20-ps-4000-np-50-svd-50 

3.40 

X 

10® 

3.40 

X 

10® 

3.40 

X 

10® 

3.40 

X 

10® 

899 

1692 

1308.5 

1334 

k-20-ps-4000-np-50-svd-70 

3.40 

X 

10® 

3.41 

X 

10® 

3.40 

X 

10® 

3.40 

X 

10® 

971 

1429 

1177.9 

1181 

k-50-ps-10000-np-10-svd-10 

3.39 

X 

10® 

3.40 

X 

10® 

3.39 

X 

10® 

3.39 

X 

10® 

732 

907 

834.7 

844 

k-50-ps-10000-np-10-svd-20 

3.39 

X 

10® 

3.40 

X 

10® 

3.40 

X 

10® 

3.40 

X 

10® 

1053 

1862 

1403.7 

1415 

k-50-ps-10000-np-10-svd-50 

3.39 

X 

10® 

3.40 

X 

10® 

3.39 

X 

10® 

3.39 

X 

10® 

1213 

1936 

1594.9 

1586 

k-50-ps-10000-np-10-svd-70 

3.40 

X 

10® 

3.40 

X 

10® 

3.40 

X 

10® 

3.40 

X 

10® 

1323 

2073 

1600.7 

1578 

k-50-ps-10000-np-15-svd-10 

3.39 

X 

10® 

3.39 

X 

10® 

3.39 

X 

10® 

3.39 

X 

10® 

791 

1140 

907.5 

851 

k-50-ps-10000-np-15-svd-20 

3.39 

X 

10® 

3.39 

X 

10® 

3.39 

X 

10® 

3.39 

X 

10® 

1240 

1735 

1487.8 

1448 

k-50-ps-10000-np-15-svd-50 

3.40 

X 

10® 

3.40 

X 

10® 

3.40 

X 

10® 

3.40 

X 

10® 

1328 

2359 

1830.6 

1794 

k-50-ps-10000-np-15-svd-70 

3.40 

X 

10® 

3.40 

X 

10® 

3.40 

X 

10® 

3.40 

X 

10® 

1299 

2310 

1778.2 

1786 

k-50-ps-10000-np-50-svd-10 

3.39 

X 

10® 

3.40 

X 

10® 

3.39 

X 

10® 

3.39 

X 

10® 

1128 

3150 

1898.7 

1840 

k-50-ps-10000-np-50-svd-20 

3.39 

X 

10® 

3.39 

X 

10® 

3.39 

X 

10® 

3.39 

X 

10® 

1578 

3391 

2328.9 

1988 

k-50-ps-10000-np-50-svd-50 

3.40 

X 

10® 

3.40 

X 

10® 

3.40 

X 

10® 

3.40 

X 

10® 

1105 

3595 

2788.2 

3003 

k-50-ps-10000-np-50-svd-70 

3.40 

X 

10® 

3.40 

X 

10® 

3.40 

X 

10® 

3.40 

X 

10® 

1405 

4388 

2864.5 

3102 

k-1 OO-ps-20000-np- 10-svd-10 

3.39 

X 

10® 

3.39 

X 

10® 

3.39 

X 

10® 

3.39 

X 

10® 

787 

1526 

1006.0 

848 

k-100-ps-20000-np-10-svd-20 

3.38 

X 

10® 

3.39 

X 

10® 

3.39 

X 

10® 

3.39 

X 

10® 

1137 

1886 

1352.1 

1303 

k-100-ps-20000-np-10-svd-50 

3.38 

X 

10® 

3.39 

X 

10® 

3.38 

X 

10® 

3.38 

X 

10® 

1368 

3142 

2028.2 

1850 

k-1 OO-ps-20000-np- 10-svd- 70 

3.38 

X 

10® 

3.39 

X 

10® 

3.39 

X 

10® 

3.39 

X 

10® 

1510 

4033 

2360.9 

2084 

k-1 OO-ps-20000-np-15-svd-10 

3.38 

X 

10® 

3.39 

X 

10® 

3.38 

X 

10® 

3.38 

X 

10® 

741 

1462 

908.1 

826 

k-1 OO-ps-20000-np-15-svd-20 

3.38 

X 

10® 

3.38 

X 

10® 

3.38 

X 

10® 

3.38 

X 

10® 

1119 

1999 

1566.5 

1591 

k-1 OO-ps-20000-np-15-svd-50 

3.38 

X 

10® 

3.39 

X 

10® 

3.39 

X 

10® 

3.39 

X 

10® 

1349 

4218 

2251.3 

2050 

k-1 OO-ps-20000-np-15-svd- 70 

3.39 

X 

10® 

3.40 

X 

10® 

3.39 

X 

10® 

3.39 

X 

10® 

1631 

3786 

2544.3 

2344 

k-100-ps-20000-np-50-svd-10 

3.39 

X 

10® 

3.40 

X 

10® 

3.39 

X 

10® 

3.39 

X 

10® 

1030 

2222 

1583.6 

1697 

k-100-ps-20000-np-50-svd-20 

3.39 

X 

10® 

3.39 

X 

10® 

3.39 

X 

10® 

3.39 

X 

10® 

1301 

6148 

3452.1 

3635 

k-100-ps-20000-np-50-svd-50 

3.39 

X 

10® 

3.40 

X 

10® 

3.40 

X 

10® 

3.39 

X 

10® 

2311 

10 506 

5589.8 

4366 

k-100-ps-20000-np-50-svd- 70 

3.39 

X 

10® 

3.40 

X 

10® 

3.40 

X 

10® 

3.40 

X 

10® 

912 

6943 

2344.5 

1307 


Table 11: Piecy-MR on StructuredWithNoise. Experiments belong to class 11. 


Conclusion. 

The experiments show the potential speed-up by using piecy and piecy-mr. When choosing the 
algorithm, one should take the dimensions of the input matrix into account. For large dimension 
but a moderate number of points, piecy is ideal since it reduces the dimension effectively with little 
overhead. For data sets where the dimension is high and the number of points is also high, the 
additional overhead of piecy-mr pays off. 
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